Lab 7
Linear Regressions II
In this lab, you will continue practicing your data skills. You will create a chart and run some regressions.
Background
Today, you will try to understand patterns in homeownership rate using IPUMS ACS data from California 2019. The data is contained in the CSV file, lab07_assignment_data.csv
.
In this data, each row is a household. (Previously, you worked with data in which each row is a person. For today’s lab, we’ll work with household level data because homeownership rate is only measured at the household level.)
The following variables are available in this data:
YEAR
: The year the household was surveyedSERIAL
: The identification number of the householdHHWT
: The survey weight of the household. You need this to construct summary statistics and run regressions.STATEFIP
: The state FIPS code (6 for California)COUNTYFIP
: The county FIPS code that the household is located inOWNERSHP
: Whether the household owns the home they live inHHINCOME
: Total income of all members in the householdSEX
: Sex of the household headAGE
: Age of the household headMARST
: Marital status of the household headRACHSING
: Race of the household headEDUC
: Educational attainment of the household headEMPSTAT
: Employment status of the household headOCC
: Occupation of the household headIND
: Industry of the household head
Don’t be lazy: make sure you look up the IPUMS codebook for any variables you don’t know the coding for.
Preparation
Download lab07_assignment_data.csv
and upload it to your R Studio Cloud files directory.
Make sure dplyr
, ggplot2
, lfe
, and stargazer
are installed in your R Studio Cloud workspace.
Assignment
Create a new script that accomplishes the following tasks:
-
Load the data from
lab07_assignment_data.csv
and store it in a dataframe calleddf
. -
Assign
NA
to invalid values ofHHINCOME
. - Create the following variables:
OWNHOME
: A boolean variable indicating whether the household owns the home they live inCOLLEGE
: A boolean variable indicating whether the household head has 4+ years of college educationMARRIED
: A boolean variable indicating whether the household head is marriedMALE
: A boolean variable indicating whether the household head is maleAGESQ
: The square of the age of household head.- Note: To calculate the square of a variable in R, you can’t use the exponent sign
^
. Instead you need to do this:df$XSQ <- df$X * df$X
.
- Note: To calculate the square of a variable in R, you can’t use the exponent sign
- Create a line plot showing homeownership rate by age of household head.
- Title the plot “Homeownership Rate by Age of Household Head: California 2019”
- Label the X axis “Age”
- Label the Y axis “Homeownership Rate”
- Don’t forget to use
HHWT
as the survey weight when calculating homeownership rate.
- Run 3 regressions and show them on a single Stargazer table.
- For all regressions, use only data of households with
HHINCOME>0
. - Regression 1: Regress
OWNHOME
onlog(HHINCOME)
. - Regression 2: Regress
OWNHOME
onlog(HHINCOME)
,AGE
,AGESQ
,MALE
,COLLEGE
,MARRIED
, andas.factor(RACHSING)
. - Regression 3: In addition to all the variables in regression 2, include
COUNTYFIP
,IND
, andOCC
as dummy variables. - Don’t forget to use
HHWT
as the survey weights in the regression.
- For all regressions, use only data of households with
Upload the script to the Lab 07 Script assignment.
Expected output
================================================================================
Dependent variable:
-----------------------------------------------------------
OWNHOME
(1) (2) (3)
--------------------------------------------------------------------------------
log(HHINCOME) 0.138*** 0.115*** 0.111***
(0.001) (0.001) (0.001)
AGE 0.023*** 0.022***
(0.0004) (0.0004)
AGESQ -0.0001*** -0.0001***
(0.00000) (0.00000)
MALE -0.021*** -0.019***
(0.002) (0.003)
COLLEGE 0.037*** 0.042***
(0.003) (0.003)
MARRIED 0.152*** 0.137***
(0.003) (0.003)
as.factor(RACHSING)2 -0.130*** -0.112***
(0.005) (0.005)
as.factor(RACHSING)3 0.019 0.006
(0.018) (0.018)
as.factor(RACHSING)4 -0.012*** 0.028***
(0.004) (0.004)
as.factor(RACHSING)5 -0.063*** -0.038***
(0.003) (0.003)
Constant -0.992*** -1.627***
(0.014) (0.017)
--------------------------------------------------------------------------------
Observations 134,094 134,094 134,094
R2 0.084 0.254 0.297
Adjusted R2 0.084 0.254 0.292
Residual Std. Error 4.682 (df = 134092) 4.226 (df = 134083) 4.117 (df = 133252)
================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Takeaways
- You can work independently on data projects involving data visualization and regression.