Correlates of Homelessness

Codelibrary(ggplot2)
library(ggthemes)
library(scales)
library(tidyverse)
library(knitr)
library(sf)
library(stargazer)
library(data.table)
library(ggmap)
library(lwgeom)
library(units)
library(lfe)
library(vtable)

#---- Load the main datasets
d.ACS <- data.table(read.csv("data/processed/ACS1YR_by_county_year.csv"))
d.PIT <- data.table(read.csv("data/processed/PIT_by_CoC_Year.csv"))
xwalk <- data.table(read.csv("data/HUD-CoC-Geography-Crosswalk/county_coc_match.csv"))


#---- Crosswalk ACS data to coc-year

key.vars <- c("state","county","year","NAME") 
mean.vars <- c("MedianIncome", "MeanIncome", "HomeownerVacRate", "RentalVacRate")
count.vars <- setdiff(names(d.ACS), union(key.vars, mean.vars))

# merge cocs onto counties
d.ACS[, county_fips:=state*1000 + county]
d2 <- inner_join(d.ACS, xwalk, by=c("county_fips"))

# collapse to coc year level
d3 <- d2 %>%
  group_by(coc_number, coc_name, year) %>% 
  summarize_at(mean.vars, ~ weighted.mean(.x, pct_cnty_pop_coc, na.rm=T)) %>%
  ungroup()

sum.partial <- function(x,w) {
  sum(x*w, na.rm=T)
}

d4 <- d2 %>%
  group_by(coc_number, coc_name, year) %>% 
  summarize_at(count.vars, ~ sum.partial(.x, pct_cnty_pop_coc/100)) %>%
  ungroup()

d <- inner_join(d3,select(d4,-coc_name),by=c("coc_number","year"))


#---- Merge ACS data with PIT data
d.PIT <- d.PIT %>%
  rename(
    year = Year,
    coc_number = CoC.Id
  )
d <- data.table(inner_join(d, d.PIT, by=c("coc_number","year")))


#---- Derived variables
d[, unemp.rate:=Unemployed/CivLaborForce*100]
d[, pct.black:=Black/TotalPopulation*100]
d[, pct.hisp:=Hispanic/TotalPopulation*100]
d[, pct.crowded:=(OwnOccHU1.01to1.5+OwnOccHU1.51to2+OwnOccHU2.01plus+
                    RentOccHU1.01to1.5+RentOccHU1.51to2+RentOccHU2.01plus)/(
                      OwnOccHU+RentOccHU)*100]
d[, pct.highrent:=(RTI_30to35+RTI_35to40+RTI_40to50+RTI_50plus)/(RenterOccHU-RTI_NA)*100]
d[, homeless.rate:=Homeless/(TotalPopulation/100000)]
d[, rooms.per.pop:=TotalRooms/(TotalPopulation)]
d[, ln.median.inc:=log(MedianIncome)]
d[, poverty.rate:=Poverty/Pop.Poverty.Determined*100]
d[, pct.bachelors:=(Educ.Bac+Educ.Grad)/(Pop.25.Over)*100]

In this post, I will explore the correlates of homelessness using data from the HUD PIT counts and the ACS. The unit of analysis is the CoC-year. This post explains the concept of a CoC (Continuum of Care) and the HUD data. ACS data is collected at the county-year level and aggregated up to CoCs using the crosswalk from Tom Byrne. The data covers 372 CoCs for the years 2007 to 2019. I focus on the following variables:

homeless.rate: The number of homeless in the CoC per 100,000 population
rooms.per.pop: The total number of rooms in the CoC (rooms in regular housing units, not shelter rooms) divided by the population
pct.crowded: The fraction of housing units with more than 1 occupant per room, which is considered crowded. The Census counts living rooms, kitchens, and dining rooms as rooms but not bathrooms. So three people living in an apartment with 1 bedroom, 1 kitchen, and 1 living room would be 1 occupant per room.
pct.highrent: The fraction of renters spending more than 30% of their income on rent
ln.median.inc: The natural log of the median household income in the CoC
unemp.rate: The unemployment rate in the CoC
poverty.rate: The fraction of people with income below 100% of poverty level in the CoC
pct.bachelors: The fraction of people aged 25 or older with a bachelors degree or higher in the CoC
pct.black: The fraction of people in the CoC who are black
pct.hisp: The fraction of people in the CoC who are hispanic

Summary Statistics

Summary statistics are shown here:

Coded %>%
  subset(!is.infinite(homeless.rate)) %>%
  sumtable(
    vars=c(
      "homeless.rate",
      "rooms.per.pop",
      "pct.crowded",
      "pct.highrent",
      "ln.median.inc",
      "unemp.rate",
      "poverty.rate", 
      "pct.bachelors",
      "pct.black",
      "pct.hisp"
    )
  )

Summary Statistics
Variable	N	Mean	Std. Dev.	Min	Pctl. 25	Pctl. 75	Max
homeless.rate	4753	244.161	282.856	17.76	107.645	266.814	4275.583
rooms.per.pop	4753	2.484	0.312	1.605	2.334	2.675	4.543
pct.crowded	4317	2.955	2.239	0.365	1.577	3.423	15.256
pct.highrent	4753	50.953	5.638	29.593	47.005	54.621	70.218
ln.median.inc	4753	10.934	0.252	10.27	10.748	11.089	11.93
unemp.rate	4750	7.344	3.008	1.426	5.104	9.076	23.276
poverty.rate	4753	13.92	4.835	2.275	10.576	17.009	39.623
pct.bachelors	4753	31.155	9.824	10.094	23.959	36.37	76.158
pct.black	4753	13.712	12.357	0	4.751	18.247	66.194
pct.hisp	4753	12.792	13.189	0	4.427	15.853	85.03

Regression Results

To investigate the correlates of homelessness, I run linear regressions with homeless.rate as the dependent variable. The results are reported in the Table below. In the first column, no fixed effects are included. In the second column, year fixed effects are included. In the third column, CoC fixed effects are included. In the last column, both year and CoC fixed effects are included.

Coder1 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent +
             ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
           | 0 | 0 | 0, 
           data=d)
r2 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent +
             ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
           | year | 0 | 0, 
           data=d)
r3 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent +
             ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
           | coc_number | 0 | 0, 
           data=d)
r4 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent +
             ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
           | year + coc_number | 0 | 0, 
           data=d)

stargazer(r1,r2,r3,r4,type="html",
          title="Correlates of Homelessness",
          no.space=T,
          omit=c("Constant"),
          add.lines =list(
            c("Year FE", " ", "Y", " ", "Y"),
            c("CoC FE", " ", " ", "Y", "Y")
          ))

**Correlates of Homelessness**

	Dependent variable:

	homeless.rate
	(1)	(2)	(3)	(4)

rooms.per.pop	-40.113^**	-33.523	49.967	126.408^***
	(19.880)	(20.999)	(33.420)	(36.393)
pct.crowded	36.026^***	36.039^***	-2.678	-0.600
	(3.142)	(3.145)	(3.326)	(3.345)
pct.highrent	2.834^***	2.893^***	1.001	1.461^*
	(0.828)	(0.829)	(0.758)	(0.764)
ln.median.inc	-308.950^***	-298.195^***	4.359	255.459^***
	(40.225)	(47.520)	(52.164)	(64.066)
unemp.rate	9.192^***	16.523^***	5.530^***	0.735
	(1.628)	(2.301)	(1.323)	(1.849)
poverty.rate	-1.430	-0.379	-3.139^*	1.123
	(1.783)	(2.114)	(1.666)	(1.843)
pct.bachelors	6.208^***	7.005^***	-6.294^***	-1.059
	(0.627)	(0.679)	(1.699)	(1.837)
pct.black	-1.970^***	-2.476^***	0.603	9.864^***
	(0.358)	(0.367)	(2.928)	(3.134)
pct.hisp	-3.310^***	-3.603^***	-7.828^***	3.973
	(0.470)	(0.472)	(2.254)	(2.764)

Year FE		Y		Y
CoC FE			Y	Y
Observations	4,315	4,315	4,315	4,315
R²	0.111	0.120	0.778	0.782
Adjusted R²	0.109	0.115	0.757	0.761
Residual Std. Error	249.407 (df = 4305)	248.502 (df = 4293)	130.328 (df = 3940)	129.303 (df = 3928)

Note:	p<0.1; p<0.05; p<0.01

In case outliers in homeless.rate are having an undue effect on the results (the distribution is quite skewed), I also ran the regression using log(homeless.rate) as the dependent variable. The results are reported below, but they are qualitatively similar to the results without taking logs.

Coder1 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent +
             ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
           | 0 | 0 | 0, 
           data=d)
r2 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent +
             ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
           | year | 0 | 0, 
           data=d)
r3 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent +
             ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
           | coc_number | 0 | 0, 
           data=d)
r4 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent +
             ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
           | year + coc_number | 0 | 0, 
           data=d)

stargazer(r1,r2,r3,r4,type="html",
          title="Correlates of Homelessness",
          no.space=T,
          omit=c("Constant"),
          add.lines =list(
            c("Year FE", " ", "Y", " ", "Y"),
            c("CoC FE", " ", " ", "Y", "Y")
          ))

**Correlates of Homelessness**

	Dependent variable:

	log(homeless.rate)
	(1)	(2)	(3)	(4)

rooms.per.pop	-0.322^***	-0.281^***	0.092	0.402^***
	(0.054)	(0.057)	(0.073)	(0.078)
pct.crowded	0.118^***	0.119^***	-0.003	0.005
	(0.009)	(0.009)	(0.007)	(0.007)
pct.highrent	0.012^***	0.012^***	0.002	0.004^**
	(0.002)	(0.002)	(0.002)	(0.002)
ln.median.inc	-1.358^***	-1.251^***	-0.265^**	0.702^***
	(0.109)	(0.129)	(0.113)	(0.137)
unemp.rate	0.020^***	0.034^***	0.016^***	0.001
	(0.004)	(0.006)	(0.003)	(0.004)
poverty.rate	-0.011^**	-0.004	-0.009^**	0.002
	(0.005)	(0.006)	(0.004)	(0.004)
pct.bachelors	0.021^***	0.022^***	-0.017^***	-0.001
	(0.002)	(0.002)	(0.004)	(0.004)
pct.black	-0.007^***	-0.008^***	-0.015^**	0.018^***
	(0.001)	(0.001)	(0.006)	(0.007)
pct.hisp	-0.011^***	-0.011^***	-0.029^***	0.009
	(0.001)	(0.001)	(0.005)	(0.006)

Year FE		Y		Y
CoC FE			Y	Y
Observations	4,315	4,315	4,315	4,315
R²	0.194	0.201	0.871	0.877
Adjusted R²	0.192	0.197	0.859	0.865
Residual Std. Error	0.677 (df = 4305)	0.675 (df = 4293)	0.283 (df = 3940)	0.277 (df = 3928)

Note:	p<0.1; p<0.05; p<0.01

Discussion

(1) No Fixed Effects. Without fixed effects, we see that many of the variables are significant for explaining homelessness and most are of the expected sign. The results suggest that there is more homelessness when there is less housing per population (rooms.per.pop), when conditions are more crowded (pct.crowded), and when housing is expensive relative to income (pct.highrent). Homelessness is also higher in poorer communities (ln.median.inc) and communities with more unemployment (unemp.rate). Interestingly, homelessness tends to be higher in areas with a more educated population (pct.bachelors) and a lower fraction of minorities (pct.black and pct.hisp).

(2) Year Fixed Effects. Adding year fixed effects do not change the overall results by much. The R-squared also does not significantly increase. This suggests that cross-sectional variation in homelessness is much larger than temporal variation. That is, the differences in homelessness across CoCs are much more significant than the changes in homelessness over time.

(3) CoC Fixed Effects. Adding CoC fixed effects reduces the estimated coefficient on many of the variables and in many cases switches the sign. Interestingly, none of the housing market variables remain statistically significant. This suggests that although differences in the housing market can explain differences in the baseline level of homelessness across CoCs, changes in housing market conditions do not explain the changes in homelessness over time within CoCs. The R-squared increases significantly when adding CoC fixed effects, suggesting again that there is a significant amount of permanent heterogeneity in homeless rates across CoCs.

(4) Year and CoC Fixed Effects. Adding both CoC and year fixed effects changes the sign on a number of coefficients. The coefficients on rooms.per.pop and ln.median.inc become large and significant. These results say that a 10% change in median income is associated with an increase in 25 homeless per 100,000 people, which is about a tenth of a standard deviation. An increase of just half a room per population is associated with an increase in 60 homeless per 100,000 people. These results are strange because it suggests that CoCs with more housing supply and with higher incomes also have more homelessness, after controlling for national trends (year fixed effects) and baseline CoC levels (CoC fixed effects). But perhaps it’s not so strange given that we know the largest increases in homelessness since 2007 have happened in California and New York. It may be that CoCs that experienced rapid economic growth also ended up leaving many people behind, thus a rise in homelessness. The positive coefficient on pct.highrent is consistent with this, as the increase in housing costs that come with economic growth may have contributed to increased homelessness. Yet, it doesn’t seem like that’s the whole story. Based on these results, a 10 percentage point increase in the share of people paying over 30% of their income in rent would only increase the number of homeless by 15 per 100,000.

Conclusion

Here are my takeaways from today’s exercise:

There is much more cross-sectional variation than temporal variation in homelessness.
Housing market variables can explain some of the cross-sectional variation in homelessness, but are not statistically significant in explaining changes to homelessness within CoCs.
CoCs with homelessness increases that were above trend also had higher income growth and higher growth in housing supply as measured by rooms per person.

This was a fairly limited exercise on a limited number of variables. Still, it shows that there don’t seem to be any easy, straightforward explanations to what drove changes in homelessness within CoCs from 2007 to 2019. We will have to continue looking for other explanations.

Summary Statistics

Regression Results

Discussion

Conclusion

Related