weighted.mean
Usage
Used to calculate the weighted mean of a variable.
Usage 1:
weighted.mean(dataframe$variable, dataframe$weights, na.rm=TRUE)
- This calculates the weighted mean of
variable
indataframe
usingweights
as weights. - We always include
na.rm=TRUE
as the third input to tell R to remove anyNA
values from the calculation.
Usage 2:
some_dataframe_name <- dataframe %>%
group_by(groupvar1, groupvar2, ...) %>%
summarize(
avg_variable = weighted.mean(variable, weights, na.rm=TRUE)
)
- This calculates the weighted mean of
variable
indataframe
within each possible combination of values forgroupvar1
,groupvar2
, … - See the vignette on summary statistics for more details
Example 1
rm(list=ls())
df <- read.csv("IPUMS_ACS2019_CA_1.csv")
weighted.mean(df$AGE, df$PERWT, na.rm=TRUE)
This calculates the weighted mean of AGE
in df
using PERWT
as weights.
Example 2
rm(list=ls())
library(dplyr)
df <- read.csv("IPUMS_ACS2019_CA_1.csv")
df$EMPLOYED <- df$EMPSTAT==1
emprate_by_race <- df %>%
group_by(RACHSING) %>%
summarize(
EMPLOYMENT_RATE = weighted.mean(EMPLOYED, PERWT, na.rm=TRUE)
)
This calculates the weighted mean of EMPLOYED
(e.g. the employment rate) within each value of RACHSING
(e.g. by race), using PERWT
as the weights.
Mathematical Background
Suppose you have \(N\) samples of a variable:
\[(x_1, x_2, x_3, \ldots, x_N)\]Along with \(N\) weights:
\[(w_1, w_2, w_3, \ldots, w_N)\]The weighted mean of \(x\) using \(w\) as weights is:
\[\frac{ \sum_{i=1}^{N} x_i w_i }{ \sum_{i=1}^{N} w_i }\]