data type
data type is a very important concept in data science. Every variable has its own data type. The data type tells R what kind of data the variable is meant to represent.
Common data types
Data type | Name in R | Other names | Meaning | Examples |
---|---|---|---|---|
integer | int | Positive or negative whole numbers | -3, 0, 1, 400 | |
numeric | num | float, double | All numbers including decimals | -3.14, 0, 1.2313, 100 |
boolean | logi | binary, dichotomous | TRUE/FALSE and 1/0 variables | TRUE, FALSE, 1, 0 |
string | chr | character | Text | “public”, “economics” |
date | Date | A date or time | 3/31/2024 12:14:00 | |
factor | factor | categorical | A variable that can be one of a fixed number of possibilities | “WHITE”, “BLACK”, “ASIAN”, “HISPANIC”, “OTHER” |
Why data types are important
Data types are important because they tell R how the variable should behave under different circumstances.
For example, integer and numeric data can be added, subtracted, multiplied, divided, and compared. Dates and times can be subtracted (to get the time between two events), but not added, multiplied, or divided. Strings and categoricals cannot be added, subtracted, multiplied, or divided at all.
Boolean variables have a special set of operations that can be applied to them, called logical operators. See here for more detail.
Converting from one data type to another
To convert variable var
in dataframe df
to a different datatype:
- Convert to integer:
df$var <- as.integer(df$var)
- Convert to numeric:
df$var <- as.numeric(df$var)
- Convert to boolean:
df$var <- as.logical(df$var)
- Convert to string:
df$var <- as.character(df$var)
- Convert to date:
df$var <- as.Date(df$var)
- Convert to factor:
df$var <- as.factor(df$var)
In this class, you’ll probably only ever have to convert a variable from integer or string to factor.