Lab 2
Introduction to R
In this lab you will be introduced to R, a programming language designed for statistical analysis.
In this lab, you will:
- Sign up for an account on posit.cloud and learn how to use RStudio Cloud.
- Load data contained in a
.csv
file into R. - Conduct basic data exploration tasks in R.
Instructions
Follow along as I show the class how to conduct today’s lab.
If you followed along correctly, you should end up with the following script. The script shows you how to:
- Read data from a CSV file into R and store it in a dataframe.
- Display the variables inside a dataframe and their data types.
- Tabulate and show summary statistics for variables in a dataframe.
- Create new variables based on existing variables.
rm(list=ls()) # Clear the workspace
# Read IPUMS_ACS2019_CA_1.csv and store it in df
df <- read.csv("IPUMS_ACS2019_CA_1.csv")
# Show the "structure" of the data
str(df)
# Tabulate SEX, MARST, and RACHSING
table(df$SEX)
table(df$MARST)
table(df$RACHSING)
# Show summary statistics for AGE and INCWAGE
summary(df$AGE)
summary(df$INCWAGE)
# Create a boolean variable called EMPLOYED
# that is TRUE when the person is employed
# and FALSE otherwise
df$EMPLOYED <- (df$EMPSTAT==1)
# Create a boolean variable called WORKING_AGE
# that is TRUE when the person's AGE is
# between 25 and 65 and FALSE otherwise
df$WORKING_AGE <- (df$AGE>=25) & (df$AGE<=65)
# Create a boolean variable called WORKING_AGE_EMPLOYED
# that is TRUE when the person's AGE is
# between 25 and 65 and the person is employed,
# and FALSE otherwise
df$WORKING_AGE_EMPLOYED <- (df$EMPLOYED==TRUE) & (df$WORKING_AGE==TRUE)
# Create a variable called LOG_INCWAGE that is
# equal to the log of INCWAGE
df$LOG_INCWAGE <- log(df$INCWAGE)
# Create a variable called BIRTH_YEAR that is
# equal to YEAR minus AGE
df$BIRTH_YEAR <- df$YEAR - df$AGE
# Show structure of data again
str(df)
# Tabulate EMPLOYED, WORKING_AGE, and WORKING_ADULT
table(df$EMPLOYED)
table(df$WORKING_AGE)
table(df$WORKING_AGE_EMPLOYED)
# Show summary statistics of LOG_INCWAGE
summary(df$LOG_INCWAGE)
If you missed something during lecture, or if you need a refresher, you may find the following docs helpful:
- Vignettes:
- Functions:
- Glossary:
Assignment
- Create a new script that accomplishes the following tasks:
- Read
IPUMS_ACS2019_CA_1.csv
and store it in a dataframe calleddf
. - Create a boolean variable called
UNEMPLOYED_WORKING_AGE_MALE
that isTRUE
if the person is:- Unemployed (but in the labor force)
- Between the ages of 25 and 65
- Male
- Create a boolean variable called
NLF_WORKING_AGE_MALE
that isTRUE
if the person is:- Not in the labor force
- Between the ages of 25 and 65
- Male
- Show the structure of the data after having created the above two variables
- Tabulate
UNEMPLOYED_WORKING_AGE_MALE
andNLF_WORKING_AGE_MALE
Hint: You’ll need to look up the codes for
EMPSTAT
andSEX
in IPUMS. - Read
- Upload the script to the Lab 02 Script assignment.
Takeaways
- You can use RStudio Cloud.
- You can do the following basic tasks in R:
- Read a CSV file into a dataframe
- View and browse a dataframe
- Show the structure of a dataframe
- Identify the datatypes of variables inside a dataframe
- Tabulate and summarize variables inside a dataframe
- Create new variables in a dataframe
- Use logical operators to create new boolean variables
- You understand the concept of data types