Code
library(ggplot2)
library(ggthemes)
library(scales)
library(tidyverse)
library(knitr)
library(sf)
library(stargazer)
library(data.table)
library(ggmap)
library(lwgeom)
library(units)
library(lfe)
library(vtable)
#---- Load the main datasets
d.ACS <- data.table(read.csv("data/processed/ACS1YR_by_county_year.csv"))
d.PIT <- data.table(read.csv("data/processed/PIT_by_CoC_Year.csv"))
xwalk <- data.table(read.csv("data/HUD-CoC-Geography-Crosswalk/county_coc_match.csv"))
#---- Crosswalk ACS data to coc-year
key.vars <- c("state","county","year","NAME")
mean.vars <- c("MedianIncome", "MeanIncome", "HomeownerVacRate", "RentalVacRate")
count.vars <- setdiff(names(d.ACS), union(key.vars, mean.vars))
# merge cocs onto counties
d.ACS[, county_fips:=state*1000 + county]
d2 <- inner_join(d.ACS, xwalk, by=c("county_fips"))
# collapse to coc year level
d3 <- d2 %>%
group_by(coc_number, coc_name, year) %>%
summarize_at(mean.vars, ~ weighted.mean(.x, pct_cnty_pop_coc, na.rm=T)) %>%
ungroup()
sum.partial <- function(x,w) {
sum(x*w, na.rm=T)
}
d4 <- d2 %>%
group_by(coc_number, coc_name, year) %>%
summarize_at(count.vars, ~ sum.partial(.x, pct_cnty_pop_coc/100)) %>%
ungroup()
d <- inner_join(d3,select(d4,-coc_name),by=c("coc_number","year"))
#---- Merge ACS data with PIT data
d.PIT <- d.PIT %>%
rename(
year = Year,
coc_number = CoC.Id
)
d <- data.table(inner_join(d, d.PIT, by=c("coc_number","year")))
#---- Derived variables
d[, unemp.rate:=Unemployed/CivLaborForce*100]
d[, pct.black:=Black/TotalPopulation*100]
d[, pct.hisp:=Hispanic/TotalPopulation*100]
d[, pct.crowded:=(OwnOccHU1.01to1.5+OwnOccHU1.51to2+OwnOccHU2.01plus+
RentOccHU1.01to1.5+RentOccHU1.51to2+RentOccHU2.01plus)/(
OwnOccHU+RentOccHU)*100]
d[, pct.highrent:=(RTI_30to35+RTI_35to40+RTI_40to50+RTI_50plus)/(RenterOccHU-RTI_NA)*100]
d[, homeless.rate:=Homeless/(TotalPopulation/100000)]
d[, rooms.per.pop:=TotalRooms/(TotalPopulation)]
d[, ln.median.inc:=log(MedianIncome)]
d[, poverty.rate:=Poverty/Pop.Poverty.Determined*100]
d[, pct.bachelors:=(Educ.Bac+Educ.Grad)/(Pop.25.Over)*100]
In this post, I will explore the correlates of homelessness using data from the HUD PIT counts and the ACS. The unit of analysis is the CoC-year. This post explains the concept of a CoC (Continuum of Care) and the HUD data. ACS data is collected at the county-year level and aggregated up to CoCs using the crosswalk from Tom Byrne. The data covers 372 CoCs for the years 2007 to 2019. I focus on the following variables:
- homeless.rate: The number of homeless in the CoC per 100,000 population
- rooms.per.pop: The total number of rooms in the CoC (rooms in regular housing units, not shelter rooms) divided by the population
- pct.crowded: The fraction of housing units with more than 1 occupant per room, which is considered crowded. The Census counts living rooms, kitchens, and dining rooms as rooms but not bathrooms. So three people living in an apartment with 1 bedroom, 1 kitchen, and 1 living room would be 1 occupant per room.
- pct.highrent: The fraction of renters spending more than 30% of their income on rent
- ln.median.inc: The natural log of the median household income in the CoC
- unemp.rate: The unemployment rate in the CoC
- poverty.rate: The fraction of people with income below 100% of poverty level in the CoC
- pct.bachelors: The fraction of people aged 25 or older with a bachelors degree or higher in the CoC
- pct.black: The fraction of people in the CoC who are black
- pct.hisp: The fraction of people in the CoC who are hispanic
Summary Statistics
Summary statistics are shown here:
Code
d %>%
subset(!is.infinite(homeless.rate)) %>%
sumtable(
vars=c(
"homeless.rate",
"rooms.per.pop",
"pct.crowded",
"pct.highrent",
"ln.median.inc",
"unemp.rate",
"poverty.rate",
"pct.bachelors",
"pct.black",
"pct.hisp"
)
)
Variable | N | Mean | Std. Dev. | Min | Pctl. 25 | Pctl. 75 | Max |
---|---|---|---|---|---|---|---|
homeless.rate | 4753 | 244.161 | 282.856 | 17.76 | 107.645 | 266.814 | 4275.583 |
rooms.per.pop | 4753 | 2.484 | 0.312 | 1.605 | 2.334 | 2.675 | 4.543 |
pct.crowded | 4317 | 2.955 | 2.239 | 0.365 | 1.577 | 3.423 | 15.256 |
pct.highrent | 4753 | 50.953 | 5.638 | 29.593 | 47.005 | 54.621 | 70.218 |
ln.median.inc | 4753 | 10.934 | 0.252 | 10.27 | 10.748 | 11.089 | 11.93 |
unemp.rate | 4750 | 7.344 | 3.008 | 1.426 | 5.104 | 9.076 | 23.276 |
poverty.rate | 4753 | 13.92 | 4.835 | 2.275 | 10.576 | 17.009 | 39.623 |
pct.bachelors | 4753 | 31.155 | 9.824 | 10.094 | 23.959 | 36.37 | 76.158 |
pct.black | 4753 | 13.712 | 12.357 | 0 | 4.751 | 18.247 | 66.194 |
pct.hisp | 4753 | 12.792 | 13.189 | 0 | 4.427 | 15.853 | 85.03 |
Regression Results
To investigate the correlates of homelessness, I run linear regressions with homeless.rate as the dependent variable. The results are reported in the Table below. In the first column, no fixed effects are included. In the second column, year fixed effects are included. In the third column, CoC fixed effects are included. In the last column, both year and CoC fixed effects are included.
Code
r1 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent +
ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
| 0 | 0 | 0,
data=d)
r2 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent +
ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
| year | 0 | 0,
data=d)
r3 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent +
ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
| coc_number | 0 | 0,
data=d)
r4 <- felm(homeless.rate ~ rooms.per.pop + pct.crowded + pct.highrent +
ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
| year + coc_number | 0 | 0,
data=d)
stargazer(r1,r2,r3,r4,type="html",
title="Correlates of Homelessness",
no.space=T,
omit=c("Constant"),
add.lines =list(
c("Year FE", " ", "Y", " ", "Y"),
c("CoC FE", " ", " ", "Y", "Y")
))
Dependent variable: | ||||
homeless.rate | ||||
(1) | (2) | (3) | (4) | |
rooms.per.pop | -40.113** | -33.523 | 49.967 | 126.408*** |
(19.880) | (20.999) | (33.420) | (36.393) | |
pct.crowded | 36.026*** | 36.039*** | -2.678 | -0.600 |
(3.142) | (3.145) | (3.326) | (3.345) | |
pct.highrent | 2.834*** | 2.893*** | 1.001 | 1.461* |
(0.828) | (0.829) | (0.758) | (0.764) | |
ln.median.inc | -308.950*** | -298.195*** | 4.359 | 255.459*** |
(40.225) | (47.520) | (52.164) | (64.066) | |
unemp.rate | 9.192*** | 16.523*** | 5.530*** | 0.735 |
(1.628) | (2.301) | (1.323) | (1.849) | |
poverty.rate | -1.430 | -0.379 | -3.139* | 1.123 |
(1.783) | (2.114) | (1.666) | (1.843) | |
pct.bachelors | 6.208*** | 7.005*** | -6.294*** | -1.059 |
(0.627) | (0.679) | (1.699) | (1.837) | |
pct.black | -1.970*** | -2.476*** | 0.603 | 9.864*** |
(0.358) | (0.367) | (2.928) | (3.134) | |
pct.hisp | -3.310*** | -3.603*** | -7.828*** | 3.973 |
(0.470) | (0.472) | (2.254) | (2.764) | |
Year FE | Y | Y | ||
CoC FE | Y | Y | ||
Observations | 4,315 | 4,315 | 4,315 | 4,315 |
R2 | 0.111 | 0.120 | 0.778 | 0.782 |
Adjusted R2 | 0.109 | 0.115 | 0.757 | 0.761 |
Residual Std. Error | 249.407 (df = 4305) | 248.502 (df = 4293) | 130.328 (df = 3940) | 129.303 (df = 3928) |
Note: | p<0.1; p<0.05; p<0.01 |
In case outliers in homeless.rate are having an undue effect on the results (the distribution is quite skewed), I also ran the regression using log(homeless.rate) as the dependent variable. The results are reported below, but they are qualitatively similar to the results without taking logs.
Code
r1 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent +
ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
| 0 | 0 | 0,
data=d)
r2 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent +
ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
| year | 0 | 0,
data=d)
r3 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent +
ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
| coc_number | 0 | 0,
data=d)
r4 <- felm(log(homeless.rate) ~ rooms.per.pop + pct.crowded + pct.highrent +
ln.median.inc + unemp.rate + poverty.rate + pct.bachelors + pct.black + pct.hisp
| year + coc_number | 0 | 0,
data=d)
stargazer(r1,r2,r3,r4,type="html",
title="Correlates of Homelessness",
no.space=T,
omit=c("Constant"),
add.lines =list(
c("Year FE", " ", "Y", " ", "Y"),
c("CoC FE", " ", " ", "Y", "Y")
))
Dependent variable: | ||||
log(homeless.rate) | ||||
(1) | (2) | (3) | (4) | |
rooms.per.pop | -0.322*** | -0.281*** | 0.092 | 0.402*** |
(0.054) | (0.057) | (0.073) | (0.078) | |
pct.crowded | 0.118*** | 0.119*** | -0.003 | 0.005 |
(0.009) | (0.009) | (0.007) | (0.007) | |
pct.highrent | 0.012*** | 0.012*** | 0.002 | 0.004** |
(0.002) | (0.002) | (0.002) | (0.002) | |
ln.median.inc | -1.358*** | -1.251*** | -0.265** | 0.702*** |
(0.109) | (0.129) | (0.113) | (0.137) | |
unemp.rate | 0.020*** | 0.034*** | 0.016*** | 0.001 |
(0.004) | (0.006) | (0.003) | (0.004) | |
poverty.rate | -0.011** | -0.004 | -0.009** | 0.002 |
(0.005) | (0.006) | (0.004) | (0.004) | |
pct.bachelors | 0.021*** | 0.022*** | -0.017*** | -0.001 |
(0.002) | (0.002) | (0.004) | (0.004) | |
pct.black | -0.007*** | -0.008*** | -0.015** | 0.018*** |
(0.001) | (0.001) | (0.006) | (0.007) | |
pct.hisp | -0.011*** | -0.011*** | -0.029*** | 0.009 |
(0.001) | (0.001) | (0.005) | (0.006) | |
Year FE | Y | Y | ||
CoC FE | Y | Y | ||
Observations | 4,315 | 4,315 | 4,315 | 4,315 |
R2 | 0.194 | 0.201 | 0.871 | 0.877 |
Adjusted R2 | 0.192 | 0.197 | 0.859 | 0.865 |
Residual Std. Error | 0.677 (df = 4305) | 0.675 (df = 4293) | 0.283 (df = 3940) | 0.277 (df = 3928) |
Note: | p<0.1; p<0.05; p<0.01 |
Discussion
(1) No Fixed Effects. Without fixed effects, we see that many of the variables are significant for explaining homelessness and most are of the expected sign. The results suggest that there is more homelessness when there is less housing per population (rooms.per.pop), when conditions are more crowded (pct.crowded), and when housing is expensive relative to income (pct.highrent). Homelessness is also higher in poorer communities (ln.median.inc) and communities with more unemployment (unemp.rate). Interestingly, homelessness tends to be higher in areas with a more educated population (pct.bachelors) and a lower fraction of minorities (pct.black and pct.hisp).
(2) Year Fixed Effects. Adding year fixed effects do not change the overall results by much. The R-squared also does not significantly increase. This suggests that cross-sectional variation in homelessness is much larger than temporal variation. That is, the differences in homelessness across CoCs are much more significant than the changes in homelessness over time.
(3) CoC Fixed Effects. Adding CoC fixed effects reduces the estimated coefficient on many of the variables and in many cases switches the sign. Interestingly, none of the housing market variables remain statistically significant. This suggests that although differences in the housing market can explain differences in the baseline level of homelessness across CoCs, changes in housing market conditions do not explain the changes in homelessness over time within CoCs. The R-squared increases significantly when adding CoC fixed effects, suggesting again that there is a significant amount of permanent heterogeneity in homeless rates across CoCs.
(4) Year and CoC Fixed Effects. Adding both CoC and year fixed effects changes the sign on a number of coefficients. The coefficients on rooms.per.pop and ln.median.inc become large and significant. These results say that a 10% change in median income is associated with an increase in 25 homeless per 100,000 people, which is about a tenth of a standard deviation. An increase of just half a room per population is associated with an increase in 60 homeless per 100,000 people. These results are strange because it suggests that CoCs with more housing supply and with higher incomes also have more homelessness, after controlling for national trends (year fixed effects) and baseline CoC levels (CoC fixed effects). But perhaps it’s not so strange given that we know the largest increases in homelessness since 2007 have happened in California and New York. It may be that CoCs that experienced rapid economic growth also ended up leaving many people behind, thus a rise in homelessness. The positive coefficient on pct.highrent is consistent with this, as the increase in housing costs that come with economic growth may have contributed to increased homelessness. Yet, it doesn’t seem like that’s the whole story. Based on these results, a 10 percentage point increase in the share of people paying over 30% of their income in rent would only increase the number of homeless by 15 per 100,000.
Conclusion
Here are my takeaways from today’s exercise:
- There is much more cross-sectional variation than temporal variation in homelessness.
- Housing market variables can explain some of the cross-sectional variation in homelessness, but are not statistically significant in explaining changes to homelessness within CoCs.
- CoCs with homelessness increases that were above trend also had higher income growth and higher growth in housing supply as measured by rooms per person.
This was a fairly limited exercise on a limited number of variables. Still, it shows that there don’t seem to be any easy, straightforward explanations to what drove changes in homelessness within CoCs from 2007 to 2019. We will have to continue looking for other explanations.