Does PA's new voter ID law impact groups differently by ethnicity?

Does PA's new voter ID law impact groups differently by ethnicity?

If you’ve been following election news, you’re likely aware of the controversy surrounding Pennsylvania’s strict Voter ID law. The new law requires that citizens show valid photo ID (restricted to a few types) when they cast a ballot. Committee of Seventy has assembled some good resources explaining the context for this law and exactly what it requires of voters. There has been a lot of debate about the impact of the law, largely around the sheer number of people who could be turned away on election day. Last week Tom Boyer, a former journalist and volunteer for the non-partisan PA Voter ID Coalition who is blogging about his own research on this issue over at Examine Voter ID, approached us to see if we could lend a hand analyzing whether the impact of the law varied based on race or ethnicity. Being both concerned citizens and inquisitive data nerds, we were happy to oblige. We’ve conducted a preliminary analysis of the impact of the law in Philadelphia using several public datasets. Keep reading for some more background about Pennsylvania’s Voter ID law, or scroll down to our analysis.


On March 14th, as part of a trend across the country of imposing new restrictions on voting, Governor Tom Corbett signed one of the nation’s strictest voter ID legislation into law here in our home state of Pennsylvania. The law requires that voters present photo ID each and every time they go to the polls, and enacts a stringent definition of what is considered valid ID.

Last week the Commonwealth Court began hearing a complaint by plaintiffs seeking to overturn the voter ID law on that the grounds that it violates Article 1, Section 5 of the Pennsylvania Constitution by depriving citizens of their constitutionally guaranteed right to vote. The plaintiffs have requested an injunction to prevent the Commonwealth from enforcing the law until the case is resolved. Additionally, the U.S. Department of Justice announced that it was investigating the legality of the law under Section 2 of the Voting Rights Act, which prohibits voting practices or procedures that discriminate on the basis of race.

While the Secretary of the Commonwealth, Carol Aichele, had previously estimated that 99 percent of Pennsylvania voters possessed the identification required to vote on election day, the state recently released two data sets that call this claim into question: first, a list of almost 759,000 voters (9.2% of the electorate) without identification issued by the Pennsylvania Department of Transportation (PennDOT); second, a list of an additional 906,000 people whose ID will have been expired for more than a year come election day, making it invalid at the polls. Inquirer reporter Bob Warner recently wrote an article outlining many of the problems with the data.

The Analysis

The Question

While we recognize that the extracts from the PennDOT database may contain errors and may not be entirely representative of who has valid ID, our analysis focused not on the number of people who may be disenfranchised, but on whether there is a pattern in who may being excluded relative to race. (This first question is extremely important, of course, it’s just not something that we are in a position to establish reliably given the available data.) Because not all voters opt to report their race when registering to vote, we needed to conduct this analysis at the population level. We opted to use Philadelphia’s 1,687 ward divisions — the equivalent of “voting precincts” in other areas — because they were a relatively small geography for which we had data and are a meaningful unit in relation to elections and civic life in Philadelphia.

The Data

While we would have liked to conduct this analysis for the entire state, unfortunately there is no readily available geographic boundary file for current voting districts statewide.  Given the short timeframe, we limited the analysis to Philadelphia because we had on hand or were able to quickly acquire all of the necessary data. Acquiring and reliably geocoding the entire state voter file would also have been a massive undertaking, though perhaps we’ll have the opportunity to do this in the future.

Philadelphia City Commissioner Stephanie Singer’s office provided a recent copy of the voter file for Philadelphia. This data set lists each registered voter along with attributes like name, address, date of birth, the ward division in which the voter resides, and a record of the elections in which the voter has cast a ballot.

Commissioner Singer’s office also shared the data provided by the PA Department of State about those voters who appear not to have valid ID. The first list is of Voter IDs corresponding to people whose names did not appear in the PennDOT database. These people are considered to have “no ID” although it’s possible that they possess a form of valid ID not issued by PennDOT, such as a passport or a state university ID card. The second list was of Voter IDs of people whose PennDOT ID expired prior to November 6th, 2011, and would thus be invalid at the polls. The “expired ID” group and the “no ID” group are mutually exclusive.

Finally, we used the Census 2010 Redistricting Data for demographic information, namely race and ethnicity. We had previously aggregated this block-level data to Philadelphia’s ward divisions (boundary files available on Open Data Philly) to support Fix Philly Districts, a legislative redistricting competition. Because we were interested in the voting age population, we used data for the population 18 and older.

The Method

The first step was to join “no ID” and “expired ID” information to the voter file, a task accomplished in Microsoft Access using the unique Voter ID fields common to the three data sets. We then narrowed our analysis to “active” voters: people who have cast a ballot in the past four years.

This data set comprised 868,648 records for active voters, of whom 135,859 (15.6%) were listed as having no ID and an additional 146,742 (16.9%) were listed as having an expired ID, for a total of 282,601 registered voters (32.5%) who may be without valid ID. There were only 26 voters for whom ward division information was not available; they were excluded from this analysis.

We then summarized the data by ward division, generating a table containing several fields:

  • Number of active voters
  • Number of active voters appearing on the “no ID” list
  • Number of active voters appearing on the “expired ID” list
  • Percent of active voters appearing on the “no ID” list
  • Percent of active voters appearing on the “expired ID” list

In ArcGIS we used the Census data to calculate the percentage of the voting age population in each ward division that was white, black/African-American, Hispanic/Latino, and Asian. We imported the voter information table into ArcMap and used the ward division number to join the voter data to the ward division shapefile with demographic information. The thematic map below (created in TileMill and hosted on MapBox) visualizes the percent of voters in each ward division who appear on either the “no ID” or “expired ID” list.


[mapbox layers=’′ api=” options=” lat=’40.002′ lon=’-75.173′ z=’12’ width=’500′ height=’300′]

The map makes clear that the spatial distribution of those who lack ID is non-random. Voters without ID are heavily concentrated around the University of Pennsylvania and Drexel University in West Philadelphia, as well as parts of North, West and Southwest Philadelphia. The rates of voters without ID are relatively low in the Northeast, Northwest, Southeast and Center City.

While there was clearly a spatial pattern to the data, the next step was to explore whether the racial composition of the ward divisions might explain the outcome variable, percent of the population without valid ID.

We generated scatterplots (Figures 2 through 5) and calculated correlation coefficients (Pearson’s r)  in Excel to visualize and measure the magnitude and direction of the relationship between the proportion of ward division’s voting age population that are of a given ethnicity and the proportion of registered voters who lack valid ID. Full-screen interactive scatterplots generated with Tableau Public can be found here are linked from each of the chart images.



The correlation coefficients were as follows:

  • White: – 0.6473
  • Black: + 0.5025
  • Latino: + 0.2575
  • Asian: – 0.1076

These linear regression models indicate that there is a strong negative correlation between the percent of white adults and the percent of voters lacking valid ID– that is, as the proportion of the population that is white increases, the proportion of voters with ID problems decreases. Conversely, there is a strong positive correlation between percent black and percent lacking valid ID– the greater the proportion of a ward division that is black, the greater the proportion voters that may be barred from the polls. Similarly, there is a small to medium positive correlation for the Latino population and a small negative correlation for the Asian population.


It was apparent from both the scatterplots and the maps that there was a small number of ward divisions with extraordinarily high proportions of the population lacking valid ID, namely in West Philadelphia in immediate proximity to the campuses of the University of Pennsylvania and Drexel University.

In OpenGeoDa we, explored the data further, generating a box plot that enabled us to identify nine ward divisions that were statistical outliers; these are mapped in Figure 6.


We were then able to quickly regenerate the scatterplots as correlation plots, recalculating the correlation coefficients exclusive of the outliers. With these outliers excluded, the correlations are even stronger:

  • White: – 0.6623
  • Black: + 0.5266
  • Latino: + 0.2638
  • Asian: – 0.1670

It’s worth noting the distribution of the data, particularly for the Latino and Asian populations. While the data for the white and black shares of the population is fairly evenly distributed (though there is clustering at the high and low ends, indicating residential segregation), the data points for Latinos and Asians are heavily clustered near zero because there are many ward divisions without substantial numbers of these groups. If you play with the filters on the interactive scatterplots you can see the trendline change– for instance, if you exclude from the analysis those ward divisions where Asians make up less that 5 percent of the population the slope of the trendline shifts from negative to positive; the direction of correlation shifts as well, indicating that the effect of increasing Asian population share on lacking valid voter ID is similar to that of blacks and Latinos.


Based on these linear regression models, it appears that Pennsylvania’s new strict photo ID requirement may be in effect a racially discriminatory voting procedure. That said, this analysis is somewhat limited in scope. We only had data for the city of Philadelphia, and it’s possible that with statewide data other patterns may become apparent. It would not be surprising if patterns in rural or even suburban areas were different.

We ran a simple linear regression model; there may be additional variables, such as educational attainment or household income, that better explain the observations and a multiple regression model may offer an even more compelling explanation. This is also a model that is being run on aggregated data; any conclusions we make characterize patterns at the ward division level and don’t enable us to make definitive statements about individuals (read all about the ecological fallacy).

All that being said, we hope that this analysis was able to cast some new light on the potential impact of our state’s new voter ID law, beyond the already striking number of Pennsylvanians who could be disenfranchised by the new requirements.


A number of people have asked questions regarding the quality of the data used in this study. We did not initially address this question at length, but we will now go into more detail about the two data sets that we used and the patterns that we see in each.

“No ID”


The first set of data released by the state was a list of those who appear on voter rolls but not in the PennDOT database; we call this the “no ID” list. In Philadelphia, this list included 135,859 active voters. There are a number of reasons why a match might not have been made with the PennDOT database. As we mentioned previously, Bob Warner at the Inquirer had a good summary of some of the reasons:

In addition to the stated problem with people who use different first names on different documents, it appears the state’s computers had problems distinguishing names containing spaces, like Mary Ellen, or Van Dyke; names with hyphens, like Olivia Newton-John; and names that computers sometimes spell with spaces, like Mc Dougall.

In Philadelphia alone, more than 10,000 people whose names begin with “Mc” were listed as not having PennDot ID. They included state Supreme Court Justice Seamus P. McCaffery, a driver whose name is spelled Mc Caffery on the city’s voter rolls.

Names with apostrophes, like O’Brien and O’Neill, were especially troublesome because PennDot’s computer system doesn’t use apostrophes, according to David Burgess, a Department of State deputy secretary in charge of computer operations.

While these mismatches may seem insubstantial, the voter ID law requires that the names on the ID and on the voter rolls must be “substantially conforming” without defining what this means. This gives poll workers a great deal of latitude to determine whether someone may or may not vote. According to a survey and analysis commissioned by the ACLU and conducted by political science professor and survey and elections expert Matt Barreto, 97.8% of voters believe that they have a valid ID. The researchers asked the question “A lot of people go by a nickname or change their name when they get married. Is the name that is printed on your {driver’s license / official photo ID} your full legal name, exactly as it would appear on the Pennsylvania voter registration record, or is there a difference?” and found that “an additional 4.3% of respondents who had an up-to-date ID, reported that their name listed on their ID did not match that which would appear on the voter registration records” (9).

This is not to say that there aren’t substantial problems with the data. The team at the Inquirer did some additional investigation, telephoning some voters who appeared on the list to verify whether they had ID. They found that approximately 75% of those with whom they spoke reported that they did in fact possess PennDOT ID, while 25% did not. Despite these problems, we were interested in including the data in our analysis to see if it revealed spatial or statistical patterns.




The map above reveals the first striking pattern: the ward divisions with the highest proportion of voters on the “no ID” list are clustered around colleges and universities. This visual inspection is born out by the statistical analysis: these are the ward divisions with values more than 3 IQR above the upper quartile mark. The cluster around University City– where the University of Pennsylvania, Drexel University and the University of the Sciences are located– is by far the most prominent, but additional clusters are visible around Temple University, LaSalle University, Saint Joseph’s University and Philadelphia University; there is another cluster just south of City Hall that may or may not be related to the University of the Arts.

This is not a surprising finding given that college students may rely primarily on their student ID cards and not have a Pennsylvania driver’s licence. This post highlights two reasons why students may lack PennDOT ID:

While UPenn’s ID cards conform to the requirements of the new law, those of several of the other universities currently do not because they lack an expiration date, according to a PennPIRG document. It’s possible that students may have another form of valid ID, such as a passport; an out-of-state driver’s license is not valid. This is an interesting finding in light of the fact that college students have historically been the targets of voter suppression efforts (scroll down to “Student Voting Barriers”), including one documented incident at Drexel University in 2008.



Aside from the clusters around the city’s universities, visual inspection suggests slightly higher rates in North and West Philadelphia and lower rates in the Northeast and Northwest. Excluding those clusters around the universities reveals a small correlation between percent with no ID and percent white and black, and essentially no correlation with percent Latino or Asian:

  • White: – 0.1276
  • Black: + 0.1255
  • Latino: + 0.0093
  • Asian: – 0.0522

If spelling errors and mismatches like the ones described by Warner were randomly distributed across the city we would expect no correlation of percent “no ID” and the ethnic composition of a ward division. That we find a small correlation suggests that there is some underlying pattern.


Expired ID


The new law requires that a PennDOT-issued ID must not have expired in the past year in order to be valid as a form of ID at the polls. In addition to the “no ID” list described above, in late July the state released to county officials a second, larger list of voters whose IDs had expired prior to November 6th, 2011; this list and the No ID list are mutually exclusive. In Philadelphia, this list included 146,742 active voters (16.9% of the total). This list represents registered voters who were positive matches in the PennDOT database, so presumably the data quality issues discussed above do not apply.

Many people are not aware of the new voter ID law’s requirements, and a substantial portion erroneously believe that they have valid ID. The survey by Barreto found that the vast majority of eligible voters believed that they had a photo ID (97.8%), but “when asked follow up questions about whether the photo ID has an expiration date, and is current” they found that “among all eligible voters 10.7% lack a non-expired PennDOT ID (driver’s license or non-diver’s ID issued by PennDOT), while 9.3% of registered voters, and 8.6% of 2008 voters lack a non-expired PennDOT ID” (24).

These survey estimates match up fairly well with the roughly 11% of voters statewide that appear on the expired ID list, but Philadelphia’s 16.9% figure outstrips the state and all other counties by a wide margin. Just as the lack of ID in Philadelphia is disproportionately high relative to the rest of the state, we were interested in seeing if particular areas of the city showed disproportionately high rates of expired ID.




In terms of the distribution of values, there is only one statistical outlier (more than 1.5 IQR above the upper quartile): ward division 2520, south of Kensington & Allegheny Avenues. Ward divisions with high rates of expired IDs again appear clustered in North, West and Southwest Philadelphia. Especially low rates can be observed in the Northeast, Northwest and most of Center City and (inverting the trend seen with “no ID”) around the universities.

These data set exhibit a meaningful correlation with race, with varying degrees of strength (strong in the case of black and white, medium for Latino and small for Asian):

  • White: – 0.7360
  • Black: + 0.5778
  • Latino: + 0.3240
  • Asian: – 0.2123


Breaking out the data into the “no ID” and “expired ID” categories revealed what appeared to be three trends:
  1. A very strong association of “no ID” with student populations.
  2. A weak correlation between “no ID” and race, possibly explained by more evenly distributed problems with matching names in the PennDOT database.
  3. A strong correlation between “expired ID” and race.
Looking at these data sets separately introduces some nuance into the analysis, but doesn’t lead to fundamentally different conclusions. We had completed these sub-analyses prior to our initial post, but felt that offering a combined analysis would be the most succinct approach. The conclusion remains that African-American and Latino communities are disproportionately affected by the voter ID law, most prominently as it relates to expired PennDOT ID and to a lesser degree with regard to not appearing in the PennDOT database.