Summer of Maps: Accounting for Uncertainty with Empirical Bayes Smoothing

Summer of Maps: Accounting for Uncertainty with Empirical Bayes Smoothing

Now in its second year, Azavea’s Summer of Maps Program has become an important resource for non-profits and student GIS analysts alike.  Non-profits receive pro bono spatial analysis work that can enhance their business decision-making processes and programmatic activities, while students benefit from Azavea mentors’ experience and expertise.   This year, three fellows worked on projects for six organizations that spanned a variety of topics and geographic regions.  This blog series documents some of their accomplishments and challenges during their fellowship.  Our 2013 sponsors, Esri and Tri-Co Digital Humanities helped make this program possible.  For more information about the program, please fill out the form on the Summer of Maps page.


Accounting for Uncertainty with Empirical Bayes Smoothing

One of the inherent difficulties of working with Census Data is uncertainty. The Census itself is commonly thought of as the survey taken every ten years by just about every household in the United States, called the Decennial Census. However, in the interest of having more up-to-date information in the face of a rapidly changing nation the Census Bureau also produces shorter “American Community Surveys” in one, three, and five-year increments. While this information is hopefully more reflective of current trends, the shorter survey period and smaller sample sizes mean that the level of uncertainty and the margins of error are elevated.

Part of my Summer of Maps tenure at Azavea involved working on a project for the Greater Philadelphia Coalition Against Hunger to identify populations in Philadelphia that were both vulnerable to hunger and eligible for SNAP benefits. Many of the best measures for assessing these two factors are only available through the American Community Survey, so uncertainty is unavoidable. While it’s possible to ignore or throw out high-error values, it’s also possible to strengthen uncertain estimates and weaken outliers through “rate smoothing”–specifically Empirical Bayesian Smoothing.

Empirical Bayes Smoothing uses the population in a region as a measure of the confidence in the data, with higher populations in a given area lending a higher confidence to the estimated number of events in that location. Empirical Bayesian Smoothing leaves estimates for areas with low margins of error alone, but nudges estimates in regions with high margins of error closer to the global average of the event rate. For the Hunger Coalition, the event being measured is the number of people who fall below an income-to-poverty ratio (IPR) of 1.5, which determines their eligibility for SNAP (formerly known as Food Stamp) benefits. The IPR divides an individual’s income by the poverty threshold appropriate to their household size. For example, an IPR of 1 means household income is equal to the poverty line, and an IPR of 2 refers to a household that earns twice the poverty threshold.

The simplest available GIS implementation of Empirical Bayes Smoothing is in Open GeoDa, a free spatial statistical tool developed by Arizona State University and Luc Anselin, a prominent statistician. GeoDa has many advanced statistical GIS functions, some of which aren’t even available in ArcGIS. To use Empirical Bayes Smoothing:

    1. Download GeoDa from with a registered (free) account.
    2. Open a shapefile with an event and base variable.Event Example: Number of people per tract living below an IPR of 1.5
    3. Base Example: Total population of each tract
    4. Right click the map and choose “Select Rates”.
    5. Select Empirical Bayes.
    6. Select Event and Base variables, press Okay.
    7. Right click the map and choose “Save Rates”.
    8. Click Add Variable to name a new field, press Okay.

Notes: The base variable may not have any records with zero values. If the event variables are extremely small (single digits) for many areas prior to smoothing, the calculation may produce a homogenous map due to negative estimates of variance, which means that the calculated rates are zero. This can sometimes be fixed by multiplying the event and base variable fields by the same factor prior to using Empirical Bayes Smoothing. This will not change the results since the calculation computes rates and not raw estimates.

There are two other smoothing techniques within GeoDa that function differently than Empirical Bayes. The first, Spatial Empirical Bayes, uses local rather than global estimates of the event variable. These estimates are based on a weighting scheme that requires detailed knowledge of how a study area varies at a small scale. The second, Spatial Rate smoothing, uses regional instead of global or local estimates, and its estimates are also based on a selected weighting scheme.

Using Empirical Bayes smoothing on American Community Survey data for all of Philadelphia for the Hunger Coalition will result in stronger estimates, and smaller margins of error. This will create a more complete picture of SNAP Eligibility within Philadelphia, rather than a patchwork with uncertain data.