Using the Census API for bulk data download

Anyone who’s worked with demographic data knows that downloading Census data can be a pain. I used to use the browser-based tools provided in data.census.gov, but there are many limitations. For one, the interface doesn’t give you access to all the available years (e.g. a table may be available for 2007 but the interface only goes back to 2010). For another, the interface won’t let you download data for all census tracts and all years at the same time. It takes a lot of clicks to get anything done on data.census.gov!

In this note I will explain my workflow for automating the data download process using the Census API. Hopefully this will save you a lot of time and effort in getting your data!

Step 1: Identify the data table you want to retrieve

We’ll still start with data.census.gov to look for the data we need. For today’s example, I’ll be looking for data on housing units and vacancy rates for all census tracts in the US. Specifically, I want to find the number of the table that contains this data. Experienced users may already be familiar with this and may want to skip this step.

First, I navigate to data.census.gov in my browser and click on “Advanced Search” that shows up when I click the search bar.

Next, I use the filters to select the topics, geographies, and years I’m interested in. Note how you can’t select all census tracts for the entire US, you can only retrieve census tracts on a state-by-state basis.

For this example, I selected Topics >> Housing >> Housing, and Geography>>Tract>>Alabama>>Baldwin County, Alabama>>All Census Tracts within Baldwin County. It doesn’t really matter what state or county you select. The purpose is to find the right data table and survey that contains the information we want at the census tract level.

After clicking Search, the interface returns a list of data tables that satisfy the criteria. For my search, the most promising table seems to be DP04: Selected Housing Characteristics, which is available from the product 2019 ACS 5-Year Estimates Data Profiles. Note both the TableID and the Product as these will be important later when we use the API.

Step 2: Identify the variable codes for the variables you’re interested in

The next step is to identify the variable codes for the variables you’re interested in. The API doesn’t label variables with human-readable names like “Housing Units”. Instead, the variables are identified via codes like DP04_0001E. So we’ll have to find out what the codes are for the variables we want to download.

For this example, we’ll be downloading data for Total Housing Units, Occupied Housing Units, Vacant Housing Unit, Homeowner Vacancy Rate, and Rental Vacancy Rate, all of which are under the group Housing Occupancy.

To find the variable codes (and for a lot of other useful stuff about the API), navigate to https://www.census.gov/data/developers/data-sets.html in your browser. Then click on the product you want to find more information for. Remember how I asked you to note the Product? Since our product was the 2019 ACS 5-Year Estimates Data Profile, we want to first click on American Community Survey 5-Year Data (2009-2019).

This takes us to a page with meta information about the API for the ACS 5-year data. Since our table was a Data Profile, we want to navigate down to the part that says Data Profile, then we want to click on one of the links for 2019 ACS Data Profiles Variables. You can click either the html, the xml, or the json – whatever you’re most comfortable looking at. The XML and JSON files will probably be more useful for advanced users who want to do some meta analysis on the variables themselves.

I clicked on the HTML. This pulls up a list of variable codes and their descriptions. Remember how all our variables come from a group called Housing Occupancy? This makes finding the variables we want easier. CTRL+F for “Estimate!!HOUSING OCCUPANCY” and that lets us find our variables! The codes we want are DP04_0001E (Total housing units), DP04_0002E (Occupied housing units), DP04_0003E (vacant housing units), DP04_0004E (homeowner vacancy rate), and DP04_0005E (rental vacancy rate).

IMPORTANT NOTE: The variable codes can change over time. In our example, they don’t. But you should double check that your variable codes are consistent across years.

Step 3: Figure out the correct API call

We’re now ready to work with the API. We’ll navigate back to the main API page for the ACS 5-year. Under the Data Profile section, we’ll click on “Examples and Supported Geography”, and then on “examples”. This gives a list of example API calls, which we can modify for our own purposes.

Here’s an example of an API call:
https://api.census.gov/data/2019/acs/acs5/profile?get=NAME,DP04_0001E,DP04_0002E,DP04_0003E,DP04_0004E,DP04_0005E&for=tract:*&in=state:01&in=county:*

You can click on this API call and it will show you data for all tracts in Alabama. Let’s break down the API call in more detail.

https://api.census.gov/data/2019/acs/acs5/profile – This tells the API we’re looking for a data profile from the 2019 ACS 5-year data. Note that we don’t have to specify the TableID—that’s included in the variable names. If we want to select a different year, we simply replace 2019 with the year we want. If the data isn’t available for that year, an error will be returned.

get=NAME,DP04_0001E,DP04_0002E,DP04_0003E,DP04_0004E,DP04_0005E – This tells the API which variables we want. NAME is the human-readable name of the geographic units, e.g. “Jefferson County, Alabama”. Geographic codes including the FIPS state, county, and tract codes are automatically included without requesting them.

&for=tract:*&in=state:01&in=county:* – This tells the API the geographic units we want the data for. This particular chunk of code says we want it for all tracts in all counties in the state 01. The states are referenced by FIPS code, so this is Alabama. Unfortunately, the API does not let you request all states at once with state:*. It will return an error if you do that. To get the data for all states, we’ll have to write a script that loops through the states.

Other Example API Calls

Here are some other API calls that I’ve found useful

Get all the variables from DP04 for all tracts in a state (warning: this returns a lot of data)
https://api.census.gov/data/2019/acs/acs5/profile?get=NAME,group(DP04)&for=tract:*&in=state:01&in=county:*

Get the data for all counties in the United States
https://api.census.gov/data/2019/acs/acs5/profile?get=NAME,DP04_0001E,DP04_0002E,DP04_0003E,DP04_0004E,DP04_0005E&for=county:*

Step 4: Write a script to loop over years and states

The final step is to write a script that loops over all the states and years you want the data for and save those files to disk. The following python script downloads the data in csv form for 2009 and 2019, for Alabama and Alaska. You can easily expand it to incorporate all the states and years you want the data for.

import requests
import json
import csv

for state in ['01', '02']:
    for year in ['2009','2019']:
        api_call = f'https://api.census.gov/data/{year}/acs/acs5/profile?get=NAME,DP04_0001E,DP04_0002E,DP04_0003E,DP04_0004E,DP04_0005E&for=tract:*&in=state:{state}&in=county:*'
        r = requests.get(api_call)
        with open(f'DP04_{year}_{state}_by_tract.csv','w',newline='') as csvfile:
            csv.writer(csvfile).writerows(r.json())

Conclusion

After years of using the Census’s cumbersome web interface, I finally took the time to learn their API. It’s saved me a lot of time and I hope this tutorial helps save you some time as well.