Despite official estimates, many parts of the US have populations that vary wildly from day to night. Think of the the repeating cycle of sleepy suburbs emptying onto highways at 8AM every morning and the city centers emptying back out at 5. There are really two different population measurements then–nighttime population consisting of permanent residents (where people live), and daytime population consisting of those who spend all day in a given location for work. This is an important difference, especially for organizations that want to find out where people are during the day, rather than where they call home.
While the Census Bureau provides estimates of daytime population at the county level through the American Fact Finder, anything smaller requires manual processing. County level daytime population data isn’t all that helpful, either. For instance, Cook County encompasses the entire city of Chicago and its nearest suburbs. This large geographic area obscures movement patterns of the population within Chicago.
In 2014, Summer of Maps alum Tim St. Onge detailed how he calculated daytime population for the city of New Haven, Connectictut. We’ll be using the same principles with the Census Transportation Planning Product’s Census Tract Flows database. Because the data file is too large for a standard spreadsheet program, we’ll be working in a PostGRES database. Check out our previous blog on getting started with PostGRES / PostGIS using the OpenGeoSuite.
The Census Bureau has two recommended methods for calculating daytime population. We’ll use the second method because it’s a little more simple:
Daytime Population = Total Resident Population + Total Workers Working in Area – Total Workers Living in Area
Step 1: Download Census Transportation Planning Product’s (CTTP) Census Tract Flows (Total Workers Commuting In, ie Daytime Workers)
- Download the 151mb Access Database of CTTP Census Tract Flows
- Export the data from the Access Database as a CSV. The resulting database should have around 4 million rows (see why we don’t use Excel?)
- Import the data into PostGRES. I used the following code to create and format an empty table with fields for each column in the CSV:
CREATE TABLE work_pop (res_st_fips integer, res_cty_fips integer, res_tract_fips integer, work_state_fips integer, work_cty_fips integer, work_tract_fips integer, est integer, moe double precision);
I then used this code in the terminal to copy the CSV to the database (requires PSQL add-on which usually comes with OpenGeoSuite):
sql COPY work_pop FROM '/Users/user/ACSdata/Tract-flows.txt' DELIMITER ',' CSV;s
The CTTP data set stores the uniqe GEOIDs for each tract, county, and state as numbers. This means that any leading zeros necessary to create a valid 11-digit GEOID field that will match downloaded data from the ACS have been stripped out. We need to add them back in. We’ll do this by converting the numbers to text, adding leading zeroes, and concatenating the state, county, and tract GEOID fields together. Learn more about how to construct an 11-digit GEOID for tracts here.
This Census Tract Flows database accounts for every single unique instance of tract-to-tract commutes. This is why the database has 4 million rows, despite the fact that there are only about 74,000 Census Tracts in the US. We’ll need to sum the counts of all the workers flowing into a given tract as a work destination. For example, tract 839100 in Chicago receives commuters from a whopping 2,202 other tracts.
- Run the query below on the PostGRES table you created in Step 1.3 and and export the results as a new table to save the result. The query will convert the numeric FIPS columns to text, add the leading zeros where necessary, and concatenate them together into an 11-digit geoid2wk(for workers) and geoid2rs (for residents), SUM the counts of workers heading to each destination tract, and then group the tracts by their common id, geoid2wk.
SELECT SUM(workers_daytime), geoid2wk FROM (SELECT workers_daytime, strstwk||strctywk||strtractwk as geoid2wk, strstrs||strctyrs||strtractrs as geoid2rs FROM (SELECT workers_daytime, to_char(res_st_fips,'FM00') as strstrs, to_char(res_cty_fips,'FM000') as strctyrs, to_char(res_tract_fips,'FM000000') as strtractrs, to_char(work_state_fips,'FM00') as strstwk, to_char(work_cty_fips,'FM000') as strctywk, to_char(work_tract_fips,'FM000000') as strtractwk FROM work_pop) as sub ) as sub2 GROUP by geoid2wk
This will result in a table that has about 74K records. We’ll be using the resulting table with additional ACS data to calculate daytime population.
- Go to http://factfinder.census.gov
- Choose “Download Center”, hit Next
- Choose the top option, “I know the dataset or table(s) that I want to download”, hit Next
- Select American Community Survey from the dropdown, select “Decennial Census” from the list, and then select “2010 SF1 100% Data”. Press “Add to your selections”. Hit Next
- Under Geographic Type select “Census Tract – 140” from the geography dropdown, select “All Census Tracts within United States” in the selection box, hit “Add to Selections”, hit Next
- In the search box type: “Total Population”. Click on the checkbox next to the TOTAL POPULATION table and then hit next, OK, and download to get the data.
- Repeat steps 1-3.
- Select “American Community Survey” from the dropdown, and then “2013-ACS 5-year estimates”.
- Click Add to selections, and then hit next.
- Select Census Tract – 140 from the geography dropdown, select “All Census Tracts within United States” in the selection box, hit “Add to Selections”, hit Next
- In the search box, type b08007
- Click the checkmark next to b08007 and download.
Note: Using 2013 5-year ACS resident worker estimates with 2010 CTTP daytime worker estimates and 2010 population doesn’t make perfect sense. Ideally all the data would be from the same year. Since the last time the CTTP data was published was 2010, it would be better to use all 2010 data. It was an oversight in the original version of the project to use ACS Table B08007 for 2013 and not 2010. While the methodology for this project is pretty sound, please take the resulting numbers with a grain of salt as the daytime worker numbers are from two years after the resident population and commuting numbers.
- Load the ACS tables from Step 3 into your database or a GIS so you can get to work. Now that we’ve pared down the daytime workers data to a more manageable 74,000 records, we can join the tables in a spreadsheet program or a GIS program. Otherwise, the next steps assume you’re using PostGRES.
- Add all the data from both ACS tables (Total Workers Living in Area, Total Resident Population) and the Daytime Worker table into the same table. Join the tables based on their matching 11-digit GEOID columns. Adapt the following PostGRES query to your specific tables:
UPDATE censustracts SET destinationcolumn = ( SELECT sourcecolumn FROM sourcecensusdata WHERE geoid10 = censustracts.geoid10)
- Once all three fields (Total Daytime Workers, Total Resident Population, Total Workers Living in Area) are in the same table add a new column to store the result of calculation. Maybe something like “poptotal_day”.
- Perform the daytime population calculation: Daytime Population (Census Bureau Method 2) = Total Resident Population + total workers – total resident workers
Here’s an example PostGRES query:
UPDATE censustracts SET poptotal_day = ( SELECT population_total + workers_daytime - workers_resident FROM censustracts)
If everything has gone well, the poptotal_day column should now have an estimate of the daytime population for every single census tract in the united states! We’re just missing one thing: spatial data! We’ve been using Census Tracts and Census Bureau estimate tables this whole time, but we have no way to display the data on a map without using spatial representations of the Census Tracts.
To display the results on a map, you need Census Tract spatial data. Unfortunately, geographic data for all Census Tracts in the entire country are difficult to find in bulk. The Census Bureau offers TIGER shapefile downloads, but they’re broken down by state, so a user would have to download a file for each state and then merge them afterwards. Assuming you can find the data or combine the files from the ACS, you can join the calculated data sets to geographic shapefiles or geoJSONs with SQL or a GIS program like QGIS or ArcGIS. The join should be based on the matching 11-character GEOIDs in the calculated daytime population, and the Census Tracts shapefiles.
Visit this link to download both the finished daytime population data for 2013 and the geographic files for all Census Tracts in the US, and check out the results below:
CartoDB truncated the shapefile field names, so if you download the dataset, here are the key fields and their descriptions:
tot_pop_da = total daytime population
workers_to = 2010 Daytime Workers (derived from summarizing the CTTP table)
workers_re = 2013 Resident Workers (from ACS 2013 5 year table B08007)
total_popu = 2010 total population (SF1 100%)