Geocoding with Cicero in Google Docs: An Open Source Collaboration

Geocoding with Cicero in Google Docs: An Open Source Collaboration

A Google Spreadsheet with addresses geocoded and stamped with their US House districts.

Several days ago as our Data Analytics and Marketing teams were planning the Azavea “Lunch and Learn” workshop series (our final Wednesday workshop is still open! Register here.), my colleague Jeremy Heffner discovered a script written originally by Dave Cole and Tom MacWright at MapBox: geo-googledocs.

Geo for Google Docs is a small Javascript add-on that adds address-based geocoding and GeoJSON export capabilities to Google Spreadsheets. So, if you have a long list of street addresses you need to geocode, you can easily throw them in a Google Spreadsheet and simply copy/paste this script via the Google Docs Script Editor. After only a few button clicks, latitude and longitude columns are added to your spreadsheet and populated with coordinates for each of your address records. How easy is that!?

Jeremy and Sarah Cordivano realized this would be a great tool to demo the principle of geocoding to attendees at Sarah’s “From Databases to Maps” Lunch and Learn workshop. Google Docs/Drive is free for personal use and nonprofit organizations and also used by many businesses, so tons of people are familiar with the software already. Many small organizations already use spreadsheets to collect data on the constituents they serve, and even if they’ve already invested in a more robust Contact or Constituent Relationship Management system (CRM) like Salesforce or CiviCRM, it’s easy to move records back and forth between a CRM and spreadsheets.

Our Cicero API supports address-based geocoding too, so Jeremy spent some time integrating Cicero as a geocoding provider into the MapBox geo-googledocs script, which already supported the Yahoo PlaceFinder and Mapquest Nominatim APIs.

We were all excited to see Tom from MapBox accept and merge Jeremy’s pull request in time for Sarah to show off the script and geocoding using the Cicero API at the July 17th Lunch and Learn workshop. She also wrote this handy PDF tutorial for installing the script and then publishing the results in a CartoDB web map.

While geocoding is cool, what really makes the Cicero API powerful is the wealth of data we have available for matching addresses to local, state, and national legislative districts, elected officials, census geographies, watersheds, schools, and more. We frequently get requests from advocacy groups and other clients to batch process a database of constituent addresses and “stamp” each record with district information. This got me thinking – it was easy enough for Jeremy to add latitude and longitude fields from Cicero to the script, what if customers could use it to batch process their own spreadsheets and include other data fields from Cicero like congressional districts or elected official contact info?

So, I made my own fork of the geo-googledocs script on Github and spent some time working on exactly that. Specifically, I added a “new” geocoder to the list, “cicero_ushouse”:

[github file = “/andrewbt/geo-googledocs/blob/master/MapBox.js” start_line = “57” end_line = “77”]

When the cicero_ushouse geocoder is used, the script makes a query to Cicero’s /legislative_district endpoint (much like Jeremy’s original modification) but filters on district_type=NATIONAL_LOWER, which for addresses in the USA will result in the return of their US House of Representatives district. I also added code in other areas of the script to pull the district’s label, state, and alphanumeric ID, and populate three additional columns in the spreadsheet with those fields. Mapbox’s code was flexible enough that these additional columns are added as attributes to each of the address points when exporting the data to GeoJSON – I didn’t need to add any extra code! Having political districts as attributes for your addresses can enable you to make choropleth maps or other analysis of where your addresses lie politically. My fork is only a proof of concept at this point, but I plan to keep adding to it. But right now it demonstrates that with just a few extra lines of Javascript, anyone can customize geo-googledocs to pull whatever fields they’d like from Cicero’s geocoding endpoints.

geo-googledocs is a tiny, one-file repository that accomplishes a fairly basic (though crucial) task, but itĀ embodies a lot of the power I see in open source software. The MapBox developers originally wrote it a year ago. Things could have ended there. But thanks to Github, we were able to discover, fork, improve, and contribute to the original project. Even if it is just one file, it’s pretty cool for informal collaboration like this to happen so seamlessly!