inner_join
Requires dplyr
package.
Usage
Used to combine two dataframes horizontally into a new dataframe. The dataframes are combined by matching the rows that have the same values on one or more identifier, or “key” variables.
This procedure is called merging two dataframes. It can also be called joining.
Usage:
some_dataframe_name <- inner_join(dataframe1, dataframe2, by=c(...))
- The first and second inputs must both be dataframes.
- The third input must be
by=c(...)
, with...
being the names of the variables you want to match on.- The variables names should be contained in double quotation marks, and separated by commas. (See examples below.)
- The output is a dataframe that combines the two input dataframes “horizontally”. It will include all the variables from both dataframes.
If you store the output in an object with the same name as one of the input dataframes, you will overwrite it. This causes data loss, so you should avoid doing this!
If more than one row in
dataframe1
match a row indataframe2
(or vice versa), then all of those rows are matched and kept in the output dataframe.
The purpose of an inner join is to combine two datasets that contain different pieces of information about the same set of subjects. For example, one dataset contains crime data about a set of neighborhoods, and another dataset contains housing data about the same neighborhoods.
Example 1
rm(list=ls())
library(dplyr)
df_1 <- read.csv("IPUMS_ACS2019_CA_1.csv")
df_2 <- read.csv("IPUMS_ACS2019_CA_2.csv")
df <- inner_join(df_1, df_2, by=c("YEAR", "SERIAL", "PERNUM"))
Example 2
Suppose df_students
contains the following data:
STUDENT_ID | CLASS_ID | TEST_SCORE |
---|---|---|
1 | A | 80 |
2 | B | 70 |
3 | B | 90 |
4 | C | 40 |
5 | A | 100 |
6 | C | 70 |
And suppose df_classes
contains:
CLASS_ID | AVERAGE_SCORE |
---|---|
A | 74.5 |
B | 80.4 |
C | 67.3 |
If you ran:
df_merged <- inner_join(df_students, df_classes, by=c("CLASS_ID"))
Then df_merged
will contain:
STUDENT_ID | CLASS_ID | TEST_SCORE | AVERAGE_SCORE |
---|---|---|---|
1 | A | 80 | 74.5 |
2 | B | 70 | 80.4 |
3 | B | 90 | 80.4 |
4 | C | 40 | 67.3 |
5 | A | 100 | 74.5 |
6 | C | 70 | 67.3 |
Other types of joins
There are other types of joins, including left_join
, right_join
, and outer_join
. However, we won’t need to use them in this class.