Correlates of Homelessness

Codelibrary(ggplot2) library(ggthemes) library(scales) library(tidyverse) library(knitr) library(sf) library(stargazer) library(data.table) library(ggmap) library(lwgeom) library(units) library(lfe) library(vtable) #---- Load the main datasets d.ACS <- data.table(read.csv("data/processed/ACS1YR_by_county_year.csv")) d.PIT <- data.table(read.csv("data/processed/PIT_by_CoC_Year.csv")) xwalk <- data.table(read.csv("data/HUD-CoC-Geography-Crosswalk/county_coc_match.csv")) #---- Crosswalk ACS data to coc-year key.vars <- c("state","county","year","NAME") mean.vars <- c("MedianIncome", "MeanIncome", "HomeownerVacRate", "RentalVacRate") count.vars <- setdiff(names(d.ACS), union(key.vars, mean.vars)) # merge cocs onto counties d.ACS[, county_fips:=state*1000 + county] d2 <- inner_join(d.ACS, xwalk, by=c("county_fips")) # collapse to coc year level d3 <- d2 %>% group_by(coc_number, coc_name, year) %>% summarize_at(mean.vars, ~ weighted.mean(.x, pct_cnty_pop_coc, na.rm=T)) %>% ungroup() sum.partial <- function(x,w) { sum(x*w, na.rm=T) } d4 <- d2 %>% group_by(coc_number, coc_name, year) %>% summarize_at(count.vars, ~ sum.partial(.x, pct_cnty_pop_coc/100)) %>% ungroup() d <- inner_join(d3,select(d4,-coc_name),by=c("coc_number","year")) #---- Merge ACS data with PIT data d.PIT <- d.PIT %>% rename( year = Year, coc_number = CoC.Id ) d <- data.table(inner_join(d, d.PIT, by=c("coc_number","year"))) #---- Derived variables d[, unemp.rate:=Unemployed/CivLaborForce*100] d[, pct.black:=Black/TotalPopulation*100] d[, pct.hisp:=Hispanic/TotalPopulation*100] d[, pct.crowded:=(OwnOccHU1.01to1.5+OwnOccHU1.51to2+OwnOccHU2.01plus+ RentOccHU1.01to1.5+RentOccHU1.51to2+RentOccHU2.01plus)/( OwnOccHU+RentOccHU)*100] d[, pct.highrent:=(RTI_30to35+RTI_35to40+RTI_40to50+RTI_50plus)/(RenterOccHU-RTI_NA)*100] d[, homeless.rate:=Homeless/(TotalPopulation/100000)] d[, rooms.per.pop:=TotalRooms/(TotalPopulation)] d[, ln.median.inc:=log(MedianIncome)] d[, poverty.rate:=Poverty/Pop.Poverty.Determined*100] d[, pct.bachelors:=(Educ.Bac+Educ.Grad)/(Pop.25.Over)*100]

In this post, I will explore the correlates of homelessness using data from the HUD PIT counts and the ACS. The unit of analysis is the CoC-year. This post explains the concept of a CoC (Continuum of Care) and the HUD data. ACS data is collected at the county-year level and aggregated up to CoCs using the crosswalk from Tom Byrne. The data covers 372 CoCs for the years 2007 to 2019. I focus on the following variables:

  • homeless.rate: The number of homeless in the CoC per 100,000 population
  • rooms.per.pop: The total number of rooms in the CoC (rooms in regular housing units, not shelter rooms) divided by the population
  • pct.crowded: The fraction of housing units with more than 1 occupant per room, which is considered crowded. The Census counts living rooms, kitchens, and dining rooms as rooms but not bathrooms. So three people living in an apartment with 1 bedroom, 1 kitchen, and 1 living room would be 1 occupant per room.
  • pct.highrent: The fraction of renters spending more than 30% of their income on rent
  • ln.median.inc: The natural log of the median household income in the CoC
  • unemp.rate: The unemployment rate in the CoC
  • poverty.rate: The fraction of people with income below 100% of poverty level in the CoC
  • pct.bachelors: The fraction of people aged 25 or older with a bachelors degree or higher in the CoC
  • pct.black: The fraction of people in the CoC who are black
  • pct.hisp: The fraction of people in the CoC who are hispanic

Summary Statistics

Summary statistics are shown here:

Coded %>% subset(!is.infinite(homeless.rate)) %>% sumtable( vars=c( "homeless.rate", "rooms.per.pop", "pct.crowded", "pct.highrent", "ln.median.inc", "unemp.rate", "poverty.rate", "pct.bachelors", "pct.black", "pct.hisp" ) )
Summary Statistics
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
homeless.rate 4753 244.161 282.856 17.76 107.645 266.814 4275.583
rooms.per.pop 4753 2.484 0.312 1.605 2.334 2.675 4.543
pct.crowded 4317 2.955 2.239 0.365 1.577 3.423 15.256
pct.highrent 4753 50.953 5.638 29.593 47.005 54.621 70.218
ln.median.inc 4753 10.934 0.252 10.27 10.748 11.089 11.93
unemp.rate 4750 7.344 3.008 1.426 5.104 9.076 23.276
poverty.rate 4753 13.92 4.835 2.275 10.576 17.009 39.623
pct.bachelors 4753 31.155 9.824 10.094 23.959 36.37 76.158
pct.black 4753 13.712 12.357 0 4.751 18.247 66.194
pct.hisp 4753 12.792 13.189 0 4.427 15.853 85.03

Regression Results

To investigate the correlates of homelessness, I run linear regressions with homeless.rate as the dependent variable. The results are reported in the Table below. In the first column, no fixed effects are included. In the second column, year fixed effects are included. In the third column, CoC fixed effects are included. In the last column, both year and CoC fixed effects are included.

Coder1 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent + ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp | 0 | 0 | 0, data=d) r2 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent + ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp | year | 0 | 0, data=d) r3 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent + ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp | coc_number | 0 | 0, data=d) r4 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent + ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp | year + coc_number | 0 | 0, data=d) stargazer(r1,r2,r3,r4,type="html", title="Correlates of Homelessness", no.space=T, omit=c("Constant"), add.lines =list( c("Year FE", " ", "Y", " ", "Y"), c("CoC FE", " ", " ", "Y", "Y") ))
Correlates of Homelessness
Dependent variable:
homeless.rate
(1) (2) (3) (4)
rooms.per.pop -40.113** -33.523 49.967 126.408***
(19.880) (20.999) (33.420) (36.393)
pct.crowded 36.026*** 36.039*** -2.678 -0.600
(3.142) (3.145) (3.326) (3.345)
pct.highrent 2.834*** 2.893*** 1.001 1.461*
(0.828) (0.829) (0.758) (0.764)
ln.median.inc -308.950*** -298.195*** 4.359 255.459***
(40.225) (47.520) (52.164) (64.066)
unemp.rate 9.192*** 16.523*** 5.530*** 0.735
(1.628) (2.301) (1.323) (1.849)
poverty.rate -1.430 -0.379 -3.139* 1.123
(1.783) (2.114) (1.666) (1.843)
pct.bachelors 6.208*** 7.005*** -6.294*** -1.059
(0.627) (0.679) (1.699) (1.837)
pct.black -1.970*** -2.476*** 0.603 9.864***
(0.358) (0.367) (2.928) (3.134)
pct.hisp -3.310*** -3.603*** -7.828*** 3.973
(0.470) (0.472) (2.254) (2.764)
Year FE Y Y
CoC FE Y Y
Observations 4,315 4,315 4,315 4,315
R2 0.111 0.120 0.778 0.782
Adjusted R2 0.109 0.115 0.757 0.761
Residual Std. Error 249.407 (df = 4305) 248.502 (df = 4293) 130.328 (df = 3940) 129.303 (df = 3928)
Note: p<0.1; p<0.05; p<0.01

In case outliers in homeless.rate are having an undue effect on the results (the distribution is quite skewed), I also ran the regression using log(homeless.rate) as the dependent variable. The results are reported below, but they are qualitatively similar to the results without taking logs.

Coder1 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent + ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp | 0 | 0 | 0, data=d) r2 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent + ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp | year | 0 | 0, data=d) r3 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent + ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp | coc_number | 0 | 0, data=d) r4 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent + ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp | year + coc_number | 0 | 0, data=d) stargazer(r1,r2,r3,r4,type="html", title="Correlates of Homelessness", no.space=T, omit=c("Constant"), add.lines =list( c("Year FE", " ", "Y", " ", "Y"), c("CoC FE", " ", " ", "Y", "Y") ))
Correlates of Homelessness
Dependent variable:
log(homeless.rate)
(1) (2) (3) (4)
rooms.per.pop -0.322*** -0.281*** 0.092 0.402***
(0.054) (0.057) (0.073) (0.078)
pct.crowded 0.118*** 0.119*** -0.003 0.005
(0.009) (0.009) (0.007) (0.007)
pct.highrent 0.012*** 0.012*** 0.002 0.004**
(0.002) (0.002) (0.002) (0.002)
ln.median.inc -1.358*** -1.251*** -0.265** 0.702***
(0.109) (0.129) (0.113) (0.137)
unemp.rate 0.020*** 0.034*** 0.016*** 0.001
(0.004) (0.006) (0.003) (0.004)
poverty.rate -0.011** -0.004 -0.009** 0.002
(0.005) (0.006) (0.004) (0.004)
pct.bachelors 0.021*** 0.022*** -0.017*** -0.001
(0.002) (0.002) (0.004) (0.004)
pct.black -0.007*** -0.008*** -0.015** 0.018***
(0.001) (0.001) (0.006) (0.007)
pct.hisp -0.011*** -0.011*** -0.029*** 0.009
(0.001) (0.001) (0.005) (0.006)
Year FE Y Y
CoC FE Y Y
Observations 4,315 4,315 4,315 4,315
R2 0.194 0.201 0.871 0.877
Adjusted R2 0.192 0.197 0.859 0.865
Residual Std. Error 0.677 (df = 4305) 0.675 (df = 4293) 0.283 (df = 3940) 0.277 (df = 3928)
Note: p<0.1; p<0.05; p<0.01

Discussion

(1) No Fixed Effects. Without fixed effects, we see that many of the variables are significant for explaining homelessness and most are of the expected sign. The results suggest that there is more homelessness when there is less housing per population (rooms.per.pop), when conditions are more crowded (pct.crowded), and when housing is expensive relative to income (pct.highrent). Homelessness is also higher in poorer communities (ln.median.inc) and communities with more unemployment (unemp.rate). Interestingly, homelessness tends to be higher in areas with a more educated population (pct.bachelors) and a lower fraction of minorities (pct.black and pct.hisp).

(2) Year Fixed Effects. Adding year fixed effects do not change the overall results by much. The R-squared also does not significantly increase. This suggests that cross-sectional variation in homelessness is much larger than temporal variation. That is, the differences in homelessness across CoCs are much more significant than the changes in homelessness over time.

(3) CoC Fixed Effects. Adding CoC fixed effects reduces the estimated coefficient on many of the variables and in many cases switches the sign. Interestingly, none of the housing market variables remain statistically significant. This suggests that although differences in the housing market can explain differences in the baseline level of homelessness across CoCs, changes in housing market conditions do not explain the changes in homelessness over time within CoCs. The R-squared increases significantly when adding CoC fixed effects, suggesting again that there is a significant amount of permanent heterogeneity in homeless rates across CoCs.

(4) Year and CoC Fixed Effects. Adding both CoC and year fixed effects changes the sign on a number of coefficients. The coefficients on rooms.per.pop and ln.median.inc become large and significant. These results say that a 10% change in median income is associated with an increase in 25 homeless per 100,000 people, which is about a tenth of a standard deviation. An increase of just half a room per population is associated with an increase in 60 homeless per 100,000 people. These results are strange because it suggests that CoCs with more housing supply and with higher incomes also have more homelessness, after controlling for national trends (year fixed effects) and baseline CoC levels (CoC fixed effects). But perhaps it’s not so strange given that we know the largest increases in homelessness since 2007 have happened in California and New York. It may be that CoCs that experienced rapid economic growth also ended up leaving many people behind, thus a rise in homelessness. The positive coefficient on pct.highrent is consistent with this, as the increase in housing costs that come with economic growth may have contributed to increased homelessness. Yet, it doesn’t seem like that’s the whole story. Based on these results, a 10 percentage point increase in the share of people paying over 30% of their income in rent would only increase the number of homeless by 15 per 100,000.

Conclusion

Here are my takeaways from today’s exercise:

  • There is much more cross-sectional variation than temporal variation in homelessness.
  • Housing market variables can explain some of the cross-sectional variation in homelessness, but are not statistically significant in explaining changes to homelessness within CoCs.
  • CoCs with homelessness increases that were above trend also had higher income growth and higher growth in housing supply as measured by rooms per person.

This was a fairly limited exercise on a limited number of variables. Still, it shows that there don’t seem to be any easy, straightforward explanations to what drove changes in homelessness within CoCs from 2007 to 2019. We will have to continue looking for other explanations.