filter

Requires dplyr package.


Usage

Used to select only the rows of a dataframe that match a condition.

Usage:

some_dataframe_name <- filter(dataframe, condition)
  • The first input must be a dataframe.
  • The second input is a condition.
  • The output is a dataframe that contains only the rows that match the condition.

If you store the output in an object with the same name as the input dataframe, you will overwrite the input dataframe. (e.g. df <- filter(df, EMPSTAT==1)) This causes data loss, so you should avoid doing this!


Example 1

rm(list=ls())
library(dplyr)

df <- read.csv("IPUMS_ACS2019_CA_1.csv")

df_employed <- filter(df, EMPSTAT==1)

In this examples, only the rows with EMPSTAT==1 (employed individuals) are kept in the dataframe df_employed.


Example 2

Suppose dataframe df contains the following data:

ID EMPSTAT SEX
1 1 1
2 3 2
3 2 1
4 1 2
5 1 1
6 2 2

If you ran the command:

df_employed_males <- filter(df, EMPSTAT==1 & SEX==1)

Then df_employed_males will contain:

ID EMPSTAT SEX
1 1 1
5 1 1

A common error

Be careful when using filter. It’s easy to create a condition which causes it to return zero rows.

For example, suppose df contains the following data, and we want to select the people with EMPSTAT==1 and the people with EMPSTAT==2.

ID EMPSTAT SEX
1 1 1
2 3 2
3 2 1
4 1 2
5 1 1
6 2 2

Students will commonly write something like this:

# (This is an example of bad code)
filtered_df <- filter(df, EMPSTAT==1 & EMPSTAT==2)

But if you did this, then filtered_df will contain zero rows. This is because a row cannot have both EMPSTAT==1 AND EMPSTAT==2, so no rows match the condition.

The correct way to select the people with EMPSTAT==1 and the people with EMPSTAT==2 is this:

# (This is the correct way to select people with EMPSTAT==1 or EMPSTAT==2)
filtered_df <- filter(df, EMPSTAT==1 | EMPSTAT==2)

This makes use of the OR operator (|) instead of the AND operator (&). It requests rows where EMPSTAT==1 OR EMPSTAT==2. The resulting dataframe will contain:

ID EMPSTAT SEX
1 1 1
3 2 1
4 1 2
5 1 1
6 2 2