Set your wayback machine to 1876, the place: the Philadelphia Centennial Exhibition. Wandering around the festival grounds, you would have seen several cameramen snapping away. Some of those images have survived admirably as the Free Library of Philadelphia‘s Centennial Exhibition Collection. Over at PhillyHistory.org, we recently added a selection of those images, and the geocoding presented quite a challenge.
The first hurdle to be overcome was that the vast majority of the Centennial Expo’s buildings (and some of the streets) don’t exist anymore! The Expo took place in a swath of Philly’s Fairmount Park, most of which has turned back into trails and fields over the years. Some of the buildings were torn down just after the Expo closed; some of them still exist, but in other places. Only a very few buildings and landmarks still occupy the same space they did back in 1876. But do not despair for there was a map! Included in the Free Library’s records was a map of the entire fair, complete with all of the buildings and roads built for the Expo, along with neat things like fountains and plazas. Dana Bauer, one of Azavea’s employees and a geo-referencing expert, took charge of the map and managed to line up enough of the old landmarks with existing geography to give us very accurate coordinates for all of the Centennial Expo’s old buildings! Hooray!
So now we have coordinates, but we need to be able to tie those coordinates to the image records in our system. Here the data was slightly uncooperative. Many of the image records had a building mentioned in some fashion and in one field or another. The challenge here was that the text in the record didn’t always match the official name of the building from the map. For example: the Main Exhibit Hall was mentioned in records as Main Exhibition Building, Main Hall, Main Bldg and several other permutations. We spent some time figuring out around 20 aliases for some of the more popular buildings in the collection.
The last step was to build a parsing-and-geocoding module to find the building references in the record data, match them up with the coordinates we found and add that information to our database. Easy right? Not so fast. All 150-some coordinate pairs and their building text keys needed to be accessible somehow to our data importer. We thought for a while about how best to do that. 150 entries isn’t a huge list, but it’s not short either. We didn’t want to have to mess with it at all once we set something up. This list also wasn’t likely to change over time, so the list didn’t need to be very accessible or easily editable by people. However, we were also going to need it again to return to the Free Library in some format or another at the end of the project. The coordinates were in an Excel spreadsheet by this point, but we didn’t want to deal with the overhead of accessing it directly. We wound up creating a data dictionary inside our importer; kind of a mini geocoder table just for this swath of Fairmont Park circa 1876. We left the parent Excel spreadsheet alone as both a backup archive and the basis for the document we’ll be returning to the Free Library.
While looking though the data for building aliases, we also noted the fields where these names were found. So all we had to do was write a small function to loop through these fields, do some text matching on our data dictionary and add the coordinates to the record in our system. Out of over 1500 images, we wound up geocoding over 90% using this method, and most of the rest simply didn’t have a building reference to find. The entire import, images and all, took under 10 minutes. Not bad at all!




