If you’ve been following election news, you’re likely aware of the controversy surrounding Pennsylvania’s strict Voter ID law. The new law requires that citizens show valid photo ID (restricted to a few types) when they cast a ballot. Committee of Seventy has assembled some good resources explaining the context for this law and exactly what it requires of voters. There has been a lot of debate about the impact of the law, largely around the sheer number of people who could be turned away on election day. Last week Tom Boyer, a former journalist and volunteer for the non-partisan PA Voter ID Coalition who is blogging about his own research on this issue over at Examine Voter ID, approached us to see if we could lend a hand analyzing whether the impact of the law varied based on race or ethnicity. Being both concerned citizens and inquisitive data nerds, we were happy to oblige. We’ve conducted a preliminary analysis of the impact of the law in Philadelphia using several public datasets. Keep reading for some more background about Pennsylvania’s Voter ID law, or scroll down to our analysis.
Background
On March 14th, as part of a trend across the country of imposing new restrictions on voting, Governor Tom Corbett signed one of the nation’s strictest voter ID legislation into law here in our home state of Pennsylvania. The law requires that voters present photo ID each and every time they go to the polls, and enacts a stringent definition of what is considered valid ID.
Last week the Commonwealth Court began hearing a complaint by plaintiffs seeking to overturn the voter ID law on that the grounds that it violates Article 1, Section 5 of the Pennsylvania Constitution by depriving citizens of their constitutionally guaranteed right to vote. The plaintiffs have requested an injunction to prevent the Commonwealth from enforcing the law until the case is resolved. Additionally, the U.S. Department of Justice announced that it was investigating the legality of the law under Section 2 of the Voting Rights Act, which prohibits voting practices or procedures that discriminate on the basis of race.
While the Secretary of the Commonwealth, Carol Aichele, had previously estimated that 99 percent of Pennsylvania voters possessed the identification required to vote on election day, the state recently released two data sets that call this claim into question: first, a list of almost 759,000 voters (9.2% of the electorate) without identification issued by the Pennsylvania Department of Transportation (PennDOT); second, a list of an additional 906,000 people whose ID will have been expired for more than a year come election day, making it invalid at the polls. Inquirer reporter Bob Warner recently wrote an article outlining many of the problems with the data.
The Analysis
The Question
While we recognize that the extracts from the PennDOT database may contain errors and may not be entirely representative of who has valid ID, our analysis focused not on the number of people who may be disenfranchised, but on whether there is a pattern in who may being excluded relative to race. (This first question is extremely important, of course, it’s just not something that we are in a position to establish reliably given the available data.) Because not all voters opt to report their race when registering to vote, we needed to conduct this analysis at the population level. We opted to use Philadelphia’s 1,687 ward divisions — the equivalent of “voting precincts” in other areas — because they were a relatively small geography for which we had data and are a meaningful unit in relation to elections and civic life in Philadelphia.
The Data
While we would have liked to conduct this analysis for the entire state, unfortunately there is no readily available geographic boundary file for current voting districts statewide. Given the short timeframe, we limited the analysis to Philadelphia because we had on hand or were able to quickly acquire all of the necessary data. Acquiring and reliably geocoding the entire state voter file would also have been a massive undertaking, though perhaps we’ll have the opportunity to do this in the future.
Philadelphia City Commissioner Stephanie Singer’s office provided a recent copy of the voter file for Philadelphia. This data set lists each registered voter along with attributes like name, address, date of birth, the ward division in which the voter resides, and a record of the elections in which the voter has cast a ballot.
Commissioner Singer’s office also shared the data provided by the PA Department of State about those voters who appear not to have valid ID. The first list is of Voter IDs corresponding to people whose names did not appear in the PennDOT database. These people are considered to have “no ID” although it’s possible that they possess a form of valid ID not issued by PennDOT, such as a passport or a state university ID card. The second list was of Voter IDs of people whose PennDOT ID expired prior to November 6th, 2011, and would thus be invalid at the polls. The “expired ID” group and the “no ID” group are mutually exclusive.
Finally, we used the Census 2010 Redistricting Data for demographic information, namely race and ethnicity. We had previously aggregated this block-level data to Philadelphia’s ward divisions (boundary files available on Open Data Philly) to support Fix Philly Districts, a legislative redistricting competition. Because we were interested in the voting age population, we used data for the population 18 and older.
The Method
The first step was to join “no ID” and “expired ID” information to the voter file, a task accomplished in Microsoft Access using the unique Voter ID fields common to the three data sets. We then narrowed our analysis to “active” voters: people who have cast a ballot in the past four years.
This data set comprised 868,648 records for active voters, of whom 135,859 (15.6%) were listed as having no ID and an additional 146,742 (16.9%) were listed as having an expired ID, for a total of 282,601 registered voters (32.5%) who may be without valid ID. There were only 26 voters for whom ward division information was not available; they were excluded from this analysis.
We then summarized the data by ward division, generating a table containing several fields:
- Number of active voters
- Number of active voters appearing on the “no ID” list
- Number of active voters appearing on the “expired ID” list
- Percent of active voters appearing on the “no ID” list
- Percent of active voters appearing on the “expired ID” list
In ArcGIS we used the Census data to calculate the percentage of the voting age population in each ward division that was white, black/African-American, Hispanic/Latino, and Asian. We imported the voter information table into ArcMap and used the ward division number to join the voter data to the ward division shapefile with demographic information. The thematic map below (created in TileMill and hosted on MapBox) visualizes the percent of voters in each ward division who appear on either the “no ID” or “expired ID” list.

Figure 1. A map of the percentage of voters without valid ID, by ward division in Philadelphia. Click on the image to go to a full-screen interactive map.
The map makes clear that the spatial distribution of those who lack ID is non-random. Voters without ID are heavily concentrated around the University of Pennsylvania and Drexel University in West Philadelphia, as well as parts of North, West and Southwest Philadelphia. The rates of voters without ID are relatively low in the Northeast, Northwest, Southeast and Center City.
While there was clearly a spatial pattern to the data, the next step was to explore whether the racial composition of the ward divisions might explain the outcome variable, percent of the population without valid ID.
We generated scatterplots (Figures 2 through 5) and calculated correlation coefficients (Pearson’s r) in Excel to visualize and measure the magnitude and direction of the relationship between the proportion of ward division’s voting age population that are of a given ethnicity and the proportion of registered voters who lack valid ID. Full-screen interactive scatterplots generated with Tableau Public can be found here are linked from each of the chart images.

Figure 2. Scatterplot of the percent white versus the percent without valid ID at the ward division level

Figure 3. Scatterplot of the percent black versus the percent without valid ID at the ward division level

Figure 4. Scatterplot of the percent Latino versus the percent without valid ID at the ward division level

Figure 5. Scatterplot of the percent Asian versus the percent without valid ID at the ward division level
The correlation coefficients were as follows:
- White: – 0.6473
- Black: + 0.5025
- Latino: + 0.2575
- Asian: – 0.1076
These linear regression models indicate that there is a strong negative correlation between the percent of white adults and the percent of voters lacking valid ID– that is, as the proportion of the population that is white increases, the proportion of voters with ID problems decreases. Conversely, there is a strong positive correlation between percent black and percent lacking valid ID– the greater the proportion of a ward division that is black, the greater the proportion voters that may be barred from the polls. Similarly, there is a small to medium positive correlation for the Latino population and a small negative correlation for the Asian population.
Outliers
It was apparent from both the scatterplots and the maps that there was a small number of ward divisions with extraordinarily high proportions of the population lacking valid ID, namely in West Philadelphia in immediate proximity to the campuses of the University of Pennsylvania and Drexel University.
In OpenGeoDa we, explored the data further, generating a box plot that enabled us to identify nine ward divisions that were statistical outliers; these are mapped in Figure 6.

Figure 6. Ward Divisions near UPenn that are statistical outliers in terms of the percent of active voters without valid ID
We were then able to quickly regenerate the scatterplots as correlation plots, recalculating the correlation coefficients exclusive of the outliers. With these outliers excluded, the correlations are even stronger:
- White: – 0.6623
- Black: + 0.5266
- Latino: + 0.2638
- Asian: – 0.1670
It’s worth noting the distribution of the data, particularly for the Latino and Asian populations. While the data for the white and black shares of the population is fairly evenly distributed (though there is clustering at the high and low ends, indicating residential segregation), the data points for Latinos and Asians are heavily clustered near zero because there are many ward divisions without substantial numbers of these groups. If you play with the filters on the interactive scatterplots you can see the trendline change– for instance, if you exclude from the analysis those ward divisions where Asians make up less that 5 percent of the population the slope of the trendline shifts from negative to positive; the direction of correlation shifts as well, indicating that the effect of increasing Asian population share on lacking valid voter ID is similar to that of blacks and Latinos.
Conclusions
Based on these linear regression models, it appears that Pennsylvania’s new strict photo ID requirement may be in effect a racially discriminatory voting procedure. That said, this analysis is somewhat limited in scope. We only had data for the city of Philadelphia, and it’s possible that with statewide data other patterns may become apparent. It would not be surprising if patterns in rural or even suburban areas were different.
We ran a simple linear regression model; there may be additional variables, such as educational attainment or household income, that better explain the observations and a multiple regression model may offer an even more compelling explanation. This is also a model that is being run on aggregated data; any conclusions we make characterize patterns at the ward division level and don’t enable us to make definitive statements about individuals (read all about the ecological fallacy).
All that being said, we hope that this analysis was able to cast some new light on the potential impact of our state’s new voter ID law, beyond the already striking number of Pennsylvanians who could be disenfranchised by the new requirements.
UPDATE
A number of people have asked questions regarding the quality of the data used in this study. We did not initially address this question at length, but we will now go into more detail about the two data sets that we used and the patterns that we see in each.
“No ID”
WHAT IT MEANS
The first set of data released by the state was a list of those who appear on voter rolls but not in the PennDOT database; we call this the “no ID” list. In Philadelphia, this list included 135,859 active voters. There are a number of reasons why a match might not have been made with the PennDOT database. As we mentioned previously, Bob Warner at the Inquirer had a good summary of some of the reasons:
In addition to the stated problem with people who use different first names on different documents, it appears the state’s computers had problems distinguishing names containing spaces, like Mary Ellen, or Van Dyke; names with hyphens, like Olivia Newton-John; and names that computers sometimes spell with spaces, like Mc Dougall.
In Philadelphia alone, more than 10,000 people whose names begin with “Mc” were listed as not having PennDot ID. They included state Supreme Court Justice Seamus P. McCaffery, a driver whose name is spelled Mc Caffery on the city’s voter rolls.
Names with apostrophes, like O’Brien and O’Neill, were especially troublesome because PennDot’s computer system doesn’t use apostrophes, according to David Burgess, a Department of State deputy secretary in charge of computer operations.
While these mismatches may seem insubstantial, the voter ID law requires that the names on the ID and on the voter rolls must be “substantially conforming” without defining what this means. This gives poll workers a great deal of latitude to determine whether someone may or may not vote. According to a survey and analysis commissioned by the ACLU and conducted by political science professor and survey and elections expert Matt Barreto, 97.8% of voters believe that they have a valid ID. The researchers asked the question “A lot of people go by a nickname or change their name when they get married. Is the name that is printed on your {driver’s license / official photo ID} your full legal name, exactly as it would appear on the Pennsylvania voter registration record, or is there a difference?” and found that “an additional 4.3% of respondents who had an up-to-date ID, reported that their name listed on their ID did not match that which would appear on the voter registration records” (9).
This is not to say that there aren’t substantial problems with the data. The team at the Inquirer did some additional investigation, telephoning some voters who appeared on the list to verify whether they had ID. They found that approximately 75% of those with whom they spoke reported that they did in fact possess PennDOT ID, while 25% did not. Despite these problems, we were interested in including the data in our analysis to see if it revealed spatial or statistical patterns.
COLLEGE STUDENTS
The map above reveals the first striking pattern: the ward divisions with the highest proportion of voters on the “no ID” list are clustered around colleges and universities. This visual inspection is born out by the statistical analysis: these are the ward divisions with values more than 3 IQR above the upper quartile mark. The cluster around University City– where the University of Pennsylvania, Drexel University and the University of the Sciences are located– is by far the most prominent, but additional clusters are visible around Temple University, LaSalle University, Saint Joseph’s University and Philadelphia University; there is another cluster just south of City Hall that may or may not be related to the University of the Arts.
This is not a surprising finding given that college students may rely primarily on their student ID cards and not have a Pennsylvania driver’s licence. This post highlights two reasons why students may lack PennDOT ID:
- Pennsylvania has more out-of-state freshmen than any other state (Symm v The United States affirmed that students can vote where they attend college, and cannot be be subject to any greater obstacles than other residents when registering)
- Young people are increasingly choosing not to drive– in 2010 15.7% percent of 20 to 34-year-olds did not have driver’s license– so even PA natives may not be registered with PennDOT
SMALL CORRELATION WITH RACE
Aside from the clusters around the city’s universities, visual inspection suggests slightly higher rates in North and West Philadelphia and lower rates in the Northeast and Northwest. Excluding those clusters around the universities reveals a small correlation between percent with no ID and percent white and black, and essentially no correlation with percent Latino or Asian:
- White: – 0.1276
- Black: + 0.1255
- Latino: + 0.0093
- Asian: – 0.0522
If spelling errors and mismatches like the ones described by Warner were randomly distributed across the city we would expect no correlation of percent “no ID” and the ethnic composition of a ward division. That we find a small correlation suggests that there is some underlying pattern.
Expired ID
WHAT IT MEANS
The new law requires that a PennDOT-issued ID must not have expired in the past year in order to be valid as a form of ID at the polls. In addition to the “no ID” list described above, in late July the state released to county officials a second, larger list of voters whose IDs had expired prior to November 6th, 2011; this list and the No ID list are mutually exclusive. In Philadelphia, this list included 146,742 active voters (16.9% of the total). This list represents registered voters who were positive matches in the PennDOT database, so presumably the data quality issues discussed above do not apply.
Many people are not aware of the new voter ID law’s requirements, and a substantial portion erroneously believe that they have valid ID. The survey by Barreto found that the vast majority of eligible voters believed that they had a photo ID (97.8%), but “when asked follow up questions about whether the photo ID has an expiration date, and is current” they found that “among all eligible voters 10.7% lack a non-expired PennDOT ID (driver’s license or non-diver’s ID issued by PennDOT), while 9.3% of registered voters, and 8.6% of 2008 voters lack a non-expired PennDOT ID” (24).
These survey estimates match up fairly well with the roughly 11% of voters statewide that appear on the expired ID list, but Philadelphia’s 16.9% figure outstrips the state and all other counties by a wide margin. Just as the lack of ID in Philadelphia is disproportionately high relative to the rest of the state, we were interested in seeing if particular areas of the city showed disproportionately high rates of expired ID.

Figure 8. A map of active voters who appeared in the PennDOT database, but whose ID will have been expired more than a year on election day.
CORRELATION WITH RACE
In terms of the distribution of values, there is only one statistical outlier (more than 1.5 IQR above the upper quartile): ward division 2520, south of Kensington & Allegheny Avenues. Ward divisions with high rates of expired IDs again appear clustered in North, West and Southwest Philadelphia. Especially low rates can be observed in the Northeast, Northwest and most of Center City and (inverting the trend seen with “no ID”) around the universities.
These data set exhibit a meaningful correlation with race, with varying degrees of strength (strong in the case of black and white, medium for Latino and small for Asian):
- White: – 0.7360
- Black: + 0.5778
- Latino: + 0.3240
- Asian: – 0.2123
CONCLUSIONS (REPRISE)
- A very strong association of “no ID” with student populations.
- A weak correlation between “no ID” and race, possibly explained by more evenly distributed problems with matching names in the PennDOT database.
- A strong correlation between “expired ID” and race.






13 Comments
Very nice analysis. I’d also be curious if there was a correlation between percent w/o ID and age group.
(PS: There is a typo in the caption for figure 3, it should say “black” and not “white”.)
This is a great question. Tom Boyer has a breakdown by age for Philadelphia that indicates yes:
http://examinevoterid.blogspot.com/2012/08/the-old-and-young.html
Similarly, the ACLU PA commissioned a study by Matt Barreto that indicates yes (also, women lack ID more than men):
http://www.aclupa.blogspot.com/2012/07/voter-id-day-two-statistically-speaking.html
And thanks for the typo catch!
That’s an awesome analysis.
The missing data set is a list of people who have state IDs but were not matched in the voter registration files. Working the polls for the primary (42nd Ward, 6th Division), we asked to see everyone’s ID as a practice run for November. Many names differed in the way they were hyphenated, split between first and last fields, ordered (Asian names in particular, last and first would be reversed), spelled (Gonzales vs. Gonzalez). Out of 120 or so voters, only one lacked valid ID — and he now has it. Boots on the ground says 32% for this division. Calls into question the entire methodology of the study — and its conclusions.
WordPress removed some of my text.
Boots on the ground says less than 1% for this division, while the data above says greater than 32%.
Hand matching should be done between those with ID who were not matched in the registration lists, and those registered who did not match the PennDOT lists. But you don’t have that data to do a comparison.
Name variants also explain why more women “lack ID” than men — married (or previously married) women have a wider variance in last names.
Hi Walter. Thanks for you feedback and for sharing your observations about your experiences as a poll worker. Although we don’t have all of the data that you are mentioning, I believe that the non-random spatial distribution of the data is meaningful. I’ll respond to a few of your points here.
We did split out the data and the analysis between the “no PennDOT ID” and the “expired ID” data. I will post an update with these results shortly, but the pattern is that the “no ID” folks are concentrated around University City — not surprising if you consider that many university students likely don’t have a PA drivers license. These data exhibit very little correlation with race, which would be consistent with a random distribution of database mismatches.
The “expired ID” population is distributed over the rest of the city and shows a high degree of correlation with race. Since these are positive matches in the PennDOT database, data quality is likely less of a problem. Were you also checking the expiration dates on ID? It looks like in your ward division 22% of active voters had expired ID.
Your observation about different spellings– which doubtless contributed substantial error to the PennDOT “no ID” list– actually highlights a problem with the law. The law says that an ID is valid when the name on it “substantially conforms” with the name on the voter rolls, but doesn’t define this condition. In the cases that you cite you may chose to let the voters cast a ballot, but other poll workers will have the latitude to refuse them or to issue a provisional ballot. In ward division 4206 11% of the cases were not found in the PennDOT database, which would correspond to about 13 people for the primary you mention. Based on the variation in spelling differences that you mention, it sounds like you came across mismatches not infrequently, putting the “invalid ID” problem in your district well above 1%.
To your observation about women’s names, Barreto’s findings about lacking valid ID are based on a survey, not on the PennDOT data we used, so it’s not attributable to database mismatches. And just as above, if a woman has an ID but the name on it doesn’t match the one on the voter rolls to the satisfaction of the poll worker she could be denied the right to vote. For the purposes of voting under the new law, this is a very real lack of valid ID.
I wanted to respond to Walter Rice’s comments.
First of all, roughly half of the people wtih ID problems in Philadelphia (and in this analysis) are on the “expired ID” list — which means they were found in the PennDOT database but their license expired before 11/2011. These people, around 146,000, presumably will have to line up at PennDOT and renew their ID or they lose voting rights. This is the group the state is saying won’t require a birth certificate because of a simplified procedure — but they will still probably have to go to DMV and wait in line for a new ID card.
As for the “no id” list, Mr. Rice’s observations echo the Philadelphia Inquirer’s report Sunday that many of the people on the “no id” list in fact have IDs but the names are spelled differently on their voter registration.
Examples are women who changed maiden name to husband’s name; Asians who reversed given name and surname (or had it done for them) on one record or the other. And Hispanics who use multiple family names but don’t have room for them on forms.
These people MAY be able to vote but they may not. There are no guarantees at this point. During the primary, the law wasn’t in effect.
In Novembern the ID will be checked against the voter registration, and particularly if Pa. remains a swing state, poll workers will not be making the call about how closely the name needs to match. There will be thousands of party volunteers — many of them lawyers — deployed to Pennsylvania, looking over the shoulders of poll workers, trying to get as many ballots thrown out as possible.
It is possible even people with minor name discrepancies may be forced to file a provisional ballot, which in the overwhelming majority of cases will go to the landfill six days later without being counted.
So yes, many of the people on the “no id” list do indeed have IDs. But whether their vote will count in November is another question.
Tom
The concept of substantial conformity points to the need to adhere to the design of the bipartisan election board. The supposedly elected judge, still often illegally appointed by the Democratic ward leader in Philadelphia, has the say; any poll worker or watcher can challenge a voter, but for each voter challenged they need to put up $10. The voter then simply needs to fill out an affidavit, and then is allowed to vote on the machine — which means that their vote is unable to be removed, because it is anonymous. So the party lawyers may be able to cause confusion or delay, but if the process is followed, they won’t have an effect on whether or not people are able to vote. The only change here is that ID is required. It really isn’t that difficult.
And the fraud problem is real — I’ve seen it firsthand — but nobody will take a report. No statistics if you won’t write it down. (Just like Penn State’s official Clery Act crime reports show no child sexual abuse — to apply the Democratic argument as to voter fraud, would say that it never happened.)
If you’re correlating PennDOT ID with race, all you’re proving is that people who live areas with high percentage African Americans are less likely to drive (or whatever other reason you would get a PennDOT ID).
The map of areas without PennDOT ID’s shows your flaw in trying to substitute in PennDOT ID’s for valid voter ID’s; University City students may not drive, but all university students are issued ID’s that are valid at the polls. Additionally, all CoP workers are also issued identification that are acceptable at the polls. People who simultaneously live and work in the city are less likely to drive.
All the analysis in the world is whistling past the graveyard. Bottom line: The voter ID laws that are being put into place are intended to shut down voting rights not expand them. This is a vast, messy, complicated electorate that could never be managed by the minutia of the computer. The amount of “voter fraud” on election day by individual voters pales in comparison to the fraud that will be committed by these new laws, or the rigging of voting machines, or voter intimidation. What is being DONE about this? Where is the OUTRAGE?
We ought to be ashamed about what is happening with voting around our country. Our goal should be to make access to our election ballots easy and accurate at the same time. The methods that the Republicans have chosen are repugnant to me. But, the Democrats I know who refuse to address the issues raised at all only play into the Republican’s hands who are willing to use voter suppression techniques to their own ends. If we are not careful, we will destroy our American Democracy and replace it with something we may not like of recognize!
What is missing here is information about how to get involved in volunteering to help people get their id’s in case the PA. Supreme Court affirms the shameful opinion of the lower court. How can I help?
9 Trackbacks
[...] the full analysis here, as noted in the [...]
[...] A Pennsylvania law that would turn away voters who don’t have a valid photo ID would disproportionately suppress voting in Philadelphia’s minority neighborhoods, according to a new study. [...]
[...] In Philadelphia, Voter ID Law Hits Blacks, Latinos Harder Aug. 08 Headlines Comments Off A Pennsylvania law that would turn away voters who don't have a valid photo ID would disproportionately suppress voting in Philadelphia's minority neighborhoods, according to a new study. [...]
[...] recent Azavea study of Philadelphia voters found strong correlations between white, black, and Latino populations and [...]
[...] The decision — which was made by Commonwealth Court Judge Robert Simpson — paves the way for the potential disenfranchisement of an estimated 758,000 voters. African-American and Latino communities will likely be disproportionately affected. [...]
[...] Based on these linear regression models, it appears that Pennsylvania’s new strict photo ID requir… [...]
[...] While Azavea works on all manner of software projects that use geographic data, there are a few areas where we spend more time than others. Elections and politics is one of these focus areas. Our elections-related work includes the Cicero API, RedistrictingTheNation, the open source DistrictBuilder redistricting software, and our recent work on the impact of the new Voter ID law in Pennsylvania. [...]
[...] battleground state, voter ID supporters are upholding a law that could potentially disenfranchise 1.4 million voters in the state — 1 in 3 voters in the Philly area alone — even though voter IDers have [...]
[...] state, voter ID supporters are upholding a law that could potentially disenfranchise 1.4 million voters in the state — 1 in 3 voters in the Philly area alone — even though voter IDers have admitted [...]