Azavea Atlas

Maps, geography and the web

Sprinting to Philadelphia: Azavea Hosting the 2015 OSGeo Code Sprint!

Open Source development is based on collaboration and communication, and yet a software project may have contributors strewn across the world in different time zones and even used to speaking different languages. The reality of collaborating via the Internet – asynchronous, textual – means it can be harder for new contributors to get up to speed, harder for experienced developers to help each other, and harder to develop features together. Being physically separated from your collaborators also inhibits growth of community and friendships.

2014 Vienna Code Sprint

A shot of the 2014 OSGeo Code Sprint. Come have fun with all these focused developers!

For a few years now, the OSGeo Foundation has alleviated these challenges and strengthened the global community of open source geospatial developers by supporting a “code sprint” in a different city each year. After the 2014 event was held in Vienna, it’s coming back across the pond.  This February 9th through 13th, Azavea will be pleased to host developers from around the world as they descend on Philadelphia for the 2015 OSGeo Code Sprint. This is the first time the Sprint has been held in Philly and we’re all excited to welcome everyone to our home – a real hotbed of civic hacking, open data, and geo nerd communities that often rely on these geospatial projects. We’re also planning a few fun evening activities in different parts of the city to give participants plenty of chances to catch up with old friends, make new ones, and experience Philly.

The Code Sprint is not like the “civic hackathons” and other events Azavea has organized in the past. Paul Ramsey, one of the core contributors to the PostGIS  project, wrote a bit about what the 2013 Boston Code Sprint was like and why you might want to attend. There won’t be competitive teams, prizes or judges like a hackathon. Instead of a short weekend, we will be sprinting together for most of a week. We’ll be improving and adding new features to the foundational geospatial tools common to pretty much any app with maps or geodata like PostGIS, Cesium, uDig, QGIS, GDAL, PDAL, and GeoTrellis. We’ll endeavor to achieve the same welcoming and friendly atmosphere we have at all our events, but if you’re interested in attending, this is definitely an event where preparation pays off. You can just show up and make a contribution, but if you’re not already contributing to a project, it will be more productive if you’ve learned a bit about the project, have a good handle on the languages in use, taken a look at the issue backlog, and have given some consideration to what contribution you’d like to make to it during the week. We’d also encourage you to have checked out the latest development branch and set up a development environment on your laptop in advance.

OSGeo Sprints in past years have been focused on projects specifically stewarded by the OSGeo Foundation and the event began as a gathering of the “C Tribe”, the software applications that use C as their primary language. However, this year we wanted to make the tent of open source geospatial a little bigger and encourage participants to work on all kinds of projects, including projects housed within LocationTech, OSGeo, or even independent projects like Leaflet or CartoDB.

So come join us in Philadelphia next month! Simply put your name on the list on the wiki page  so we can plan for you to be here, and book your travel – including a room at the Loew’s Philadelphia at our discounted rate. Open Source is usually a marathon, but let’s sprint while we can!

“It Started with a Map”: Second Annual LocationTech & GeoPhilly Conference

Conference attendees

Photo credit Delaware River Watershed Initiative

It’s hard to believe, but with the return of the much-expanded 2014 LocationTech Tour to Philadelphia this past November 20th, the GeoPhilly Meetup celebrated its first birthday! That means Sarah Cordivano and I have organized and hosted over 12 monthly GeoPhilly meetups over the past year, including ones with PhillyPUG, MaptimePHL, and even last year’s Geo Open Source conference. The event last year was so well attended and enjoyed that we decided to bring back this larger showcase and celebration of the Philly open source geospatial community again.

Old City's Arch Street Quaker Meeting House

Old City’s Arch Street Quaker Meeting House

We kept the venue, the beautifully historic Arch Street Meeting House, and much of the same format with a slate of several talks on a weekday afternoon and a trip to the bar afterwards to hang out with fellow map nerds.

Michael Brennan of Secondmuse

Our Keynote speaker this year, Mike Brennan from SecondMuse, “zoomed out” a bit away from our map-mania and had us examine some of the broader human elements and social structures of open source collaboration in which we develop our map-based tools. I had the good fortune to work more with SecondMuse this year on Al Jazeera’s Canvas hackathon, and it was exciting to hear more about the other projects SecondMuse is involved in globally that advance civic impact through open source collaboration. That’s a goal held close by many in our geospatial community as well.

Ingrid Burrington of Mapzen

Up next was Ingrid Burrington from Mapzen, who kept us asking questions about how we collaborate – in this case, how we all decide what to call where we are. Specifically, her talk was on gazetteers and geocoding in open source; how both of these are still hard problems to figure out; and how open source administrative and political boundary data can help.

Ryan Arana and Josh Yaganeh of Esri

Ryan Arana and Josh Yaganeh came all the way from Esri’s Portland R&D center and gave a talk on a new open source tool they built, is useful for playing with and inspecting GeoJSON data sent over HTTP, and uses some cool technologies in the form of WebSockets and Go.

Lauren Ancona, our first Philly “civic hacker” of the day and who recently started as a Data Scientist at the City, spoke next about her obsession with parking data and mapping parking rules in Philadelphia using open source. She encouraged us to try and help out with, and fielded some inspiring and candid questions about what it’s like being a beginner in maps and civic hacking and how to keep getting better at it.

Our second local civic hacking (or civic mapping?) project was presented by James Tyack. His project is also based around transportation and crowdsourcing reports like Lauren’s parking projects, but focuses on accessibility for city dwellers with disabilities, who often have considerable barriers to getting around town every day. James shocked us with examples of how even mapping apps aimed at accessibility concerns can even be inaccessibly designed themselves – a concern which Sarah recently blogged about.

Azavea’s own Rob Emanuele gave the penultimate talk of the evening on recent improvements to GeoTrellis – our high performance processing library for very large raster data – that take advantage of Apache Spark to process data stored in Accumulo. The combination can serve map tiles from very complex datasets like land cover types for the whole USA, or global climate change models very quickly for snappy web applications.

Matt Amato of AGI's Cesium team

Our final talk of the evening was by Matt Amato from AGI, who spoke about the Cesium project and gave examples of the power inherent in time-dynamic geospatial visualizations. As geographers focused on visualizing place, we sometimes have more trouble visualizing time. Web geodata formats like GeoJSON and KML don’t always make the best accommodations for temporal data, and that’s where Cesium’s CZML standard comes in. Matt wowed us with WWI battle maps and car traffic simulations incorporating an aspect of time.

After a set of talks this cool, our group was excited to head to the Buffalo Billiards bar to talk about what we had learned. We had 94 attendees come for the afternoon – 1 more than last year! – and even more show up for the evening. It has been a fun first year with the GeoPhilly community and I’m looking forward to the next twelve months as we continue growing in numbers, skills, and maps!

Five New GIS Tools in 2014

Last year saw the rise of the #geohipster hashtag, #maptime meetups and continued expansion and adoption of OpenStreetMap. Here are five exciting geospatial software apps and tools released in 2014 and sure to grow in 2015:

Morgan Herlocker at Mapbox left us with perhaps the greatest Christmas present of all in late December, geoprocessing in the browser. Turf is a javascript library for performing common geospatial functions such as buffering, merging or calculating centroids. One of the nice features of Turf is that since it runs in the browser, it can run completely client-side and offline so it doesn’t need to connect to an external API. It’s modular too, so you include only what you need in your web app. Coming soon: the end of the spatial database as we know it?


Also from Mapbox, 2014 saw the release of Mapbox Studio, a major upgrade to the desktop based Tilemill software. Like Tilemill, Mapbox Studio allows you to add data (such as shapefles, geojson, csv) and style it using CartoCSS. But it also makes connecting to and styling Mapbox’s streets, terrain and satellite data extremely easy – it’s streamed right into the application. Source data can also be streamed from data stored in your Mapbox account. One of the nice features is the color picker, which essentially eliminates the need to figure out what hex code to add, you can simply choose the color and it will do the rest. Along the same lines, typography took a big step forward in Mapbox Studio with over 300 fonts built-in, selected for their use with digital cartography. As always, you can add your own custom fonts as well.

Taking story maps the next level, Odyssey.js was released by CartoDB in the summer. Inspired by experiments in interactive storytelling like the excellent New York Times Snow Fall project, Odyssey is a javascript library for developing interactive map-based stories. There’s also a convenient web-based editor and a number of templates, which allows anyone to craft a professional interactive story in minutes. But of course it’s also an open source javascript library, so developers can build their own custom interaction with ease.

arcgispro_screenshotYou can’t spell ArcGIS without GIS. In 2014, Esri released a brand new desktop software application called ArcGIS Pro. It’s a major overhaul to the Esri desktop ecosystem, and while not meant to replace ArcGIS Desktop (merely supposed to work alongside it), I can’t see how it wouldn’t in time. It’s a new multi-threaded 64-bit application with a fresh user interface for visualization and analysis. It’s available in beta now and the official release is anticipated for this month. The beta is fairly robust, with fast rendering and much-improved color palettes and symbology options, and though the ribbon interface (similar to Microsoft Office) is simpler it takes some time getting used to. Pro has a streamlined interface for working with data stored in ArcGIS Online. I’m hoping to see some features released soon that will make it easier to work with open data and formats.

Unmanned aerial drones were in the news for a lot of reasons in 2014, but mostly because of Amazon. However, with the launch of OpenDroneMap, there will soon be an open source toolkit to process civilian drone products. As of now, it can only process point clouds, but the project is active on GitHub and expects to have tools to process and upload high-resolution imagery, digital elevation models and other data.

Though not a new GIS tool in 2014, the Humanitarian OpenStreetMap Team did some amazing work mapping the Ebola outbreak in West Africa and improving map coverage of the Philippines after Typhoon Haiyan. 2014 was a big year for spatial with open geo taking off and satellite data becoming more attainable than ever. With a track record like that, 2015 should be an exciting year.


Summer of Maps 2015: Now Accepting Applications for Nonprofits

Summer of Maps logo

Applications are now open for Nonprofits seeking pro-bono GIS analysis through the Summer of Maps program.  Summer of Maps offers fellowships to student GIS analysts to perform geographic data analysis for non-profit organizations.  The program matches non-profit organizations that have spatial analysis and visualization needs with talented students of GIS analysis to implement projects over a three-month period during the summer.  Below is the timeline for the 2015 program:

    • Jan 5 – Feb 8: Non-profit organizations can submit brief proposals for spatial analysis projects to Azavea
    • Feb 9 – Feb 26: Azavea program administrators review organizations and narrows the list to finalists
    • Feb 27 – Mar 15: Students submit applications including proposals to work on finalist projects
    • Mar 17 – Mar 31: Student candidate reviews and interviews
    • Apr 13: Successful Summer of Maps fellows are notified
    • May 1: Public announcement of fellows and organizations
    • June – August: Summer of Maps fellows work on spatial analysis projects
    • For the most up to date schedule, please consult the Summer of Maps site.

What benefits do non-profit orgs receive?

    • Pro-bono services from a talented student GIS analyst to geographically analyze and visualize your data
    • Visualization of data in new ways and combination of data with other demographic and geographic data to draw new observations
    • High quality maps that can be used to make a case to funders or support new initiatives

What benefits do students receive?

    • Opportunity to work spatial analysis projects that support the social missions of a non-profit organizations
    • Work directly with Azavea mentors to improve GIS skills
    • Receive a monthly stipend
    • Gain work experience implementing and managing a GIS project

If you are a non-profit organization and have a project you would like to see implemented, please submit an application.  The deadline is Sunday Feb 8th, 2015 11pm EST.  Nonprofits can check out the finalist organization proposals from 2014 for inspiration.  Keep in mind that students will be selecting from the finalist projects so identifying a project that is interesting and engaging is key in having your project be selected.  If you are a student, stay tuned – applications will open Feb 27th, 2015.

To learn more about Azavea Summer of Maps check out the web site.  The Summer of Maps website has additional information on:

Fellowship Sponsors

We’d like to expand Summer of Maps and we’re looking for sponsors.  If you are interested in sponsoring a fellow or a mentor, please be in touch.

When Mapping Quantities, Choices Matter

An article just came up last Friday on Technically Philly that mentioned one of the winning projects from Azavea’s Open Data Philly Visualization Contest, a bike theft study by Greg Kaminsky. Greg chose to look at bicycle theft in Philadelphia, which was similar to a 2013 Summer of Maps project I completed for the Bicycle Coalition of Greater Philadelphia. Greg’s takeaway was that the highest number of bicycles stolen from one single location was 15 and this occurred right outside of City Hall. While that might be true based on the geocoding and visualization techniques used, it seems that more complexity regarding clustering of thefts exist in the data. Perhaps there are more significant clusters (though not falling on an exact point) located elsewhere. Based on my previous work in the area, I knew several areas other than City Hall also had high rates of theft, so I decided to explore some other metrics for analyzing data clusters.

Greg wasn’t incorrect, but his results demonstrate how the choices cartographers make during map creation can greatly affect the results. The most bike thefts in a geographic location varies across the city of Philadelphia depending on a number of factors, such as geographic scale or the normalization used. How we define geographic location is very important. For example, we could decide to use small buffers around theft incidents, look at street corners, blocks, city council or police districts, census tracts, and on and on.

I wanted to explore how the choice of geographic level by which to aggregate thefts affects where apparent clusters of theft exist. Let’s take a look at the full set 10,747 reported thefts over six years:

Thefts from 2007-2012.

Thefts from 2007-2012.

It’s not immediately very easy to understand exactly where the highest amounts of thefts are. The huge number of points ends up making everything far too busy. Now let’s see which locations have had the most bikes stolen from them based on the type of geographic clustering or aggregation we use.

Clustering by geographic boundary

1. Clustering based on City Blocks.

For this method, I identified the city block each theft fell within or was nearest, and summed the thefts per block. What we have is a map of thefts over a six-year period from January 1, 2007 to December 31, 2012 by block across Philadelphia. It appears that high-theft areas have been narrowed down considerably. The thefts per block range from two to twelve, with darker colored blocks having higher amounts of theft. Areas of high theft are immediately apparent near Temple University, University City, Center City, and in far south Philadelphia. Keep in mind, some blocks are larger in size than others, so this should not be considered an approximation for density.

Thefts aggregated to each block.

Thefts aggregated to each block.

The only blocks with more than ten thefts over the time period (January 2007 – December 2012) are located in University City:

Highest bike theft blocks in Philadelphia.

Highest bike theft blocks in Philadelphia.


2. Clustering based on Census Tracts

Census tracts are areas delineated by the Census Bureau to optimally contain about 3,000 – 8,000 people each, across most of the United States. Tracts are about 1/3 the size of a typical Philadelphia neighborhood, and are extremely useful for visualizing and interpreting thousands of Census Bureau variables for demographic analysis. Here we see the highest-theft areas by census tract.

Bike thefts aggregated by census tract.

Bike thefts aggregated by census tract.

The Census tract that makes up most of University City is yet again the highest theft area. This is the only tract with more than 300 reported thefts over the 2007-2012 period.

The highest bike theft Census Tracts in Philadelphia.

The highest bike theft Census Tract in Philadelphia, with more than 300 thefts.


3. Clustering based on Neighborhoods

Neighborhoods are fun to use for this kind of analysis because they’re recognizable places with names and characters we can identify easily. This method again repeats the aggregation methodologies from before, just at a larger level. The highest theft neighborhoods are Washington Square West (530 thefts), Rittenhouse (790 thefts), and University City (816 thefts).

The three neighborhoods with the highest reported amounts of bike theft in Philadelphia.

The three neighborhoods with the highest reported amounts of bike theft in Philadelphia.

It’s interesting to see how the top theft location changes based on what kind of geography we choose to summarize the data by. Here’s another wrinkle: some of these geographies have much larger perimeters or areas, sometimes because they include parks, cemeteries, or have to cover a larger geographic area to include a sufficient number of people to fit the Census requirements.

Clustering by Proximity

Maybe aggregation based on blocks, tracts, or neighborhoods isn’t the best way to measure “a location”. Perhaps the reason a certain area is hit more often by bike thieves is because of values present only in the immediate area, such as a vacant building, poor lighting, or proximity to an easy exit. One common operation in GIS is buffering, where a circle or shape is drawn around a point at a given thickness. If we draw buffers of a given distance around every single theft, and then count how many thefts fall inside each circle, we can find the circles or areas that have the most thefts. Let’s try this out at a few different distances.

100 Foot Buffers

When 100 foot buffers are used, only a few areas in the city show up with buffers that contain more than eight thefts in a 100 foot radius over the six year period.


Theft aggregate buffers at 100 foot distances. Highest amount of thefts per buffer is 9.

Theft aggregate buffers at 100 foot distances. Highest amount of thefts per buffer is 9.

300 Foot Buffers

Now let’s bump up the buffer radius around each theft to 300 feet and then see how many fall inside each buffer. Interestingly, it seems that the high theft areas have shifted. The only buffers with more than 15 thefts each are located at Walnut and Broad, 9th and South St, and 13th and Walnut:

300 Foot buffers with more than 15 reported bicycle thefts from 2007 - 2012.

300 Foot buffers with more than 15 reported bicycle thefts from 2007 – 2012.

500 Foot Buffers

When we enlarge the buffers to 500 feet so they contain about one square block, the high theft areas shift yet again, with the largest buffers all containing 25-32 thefts. Most of the thefts are now centered around the blocks of Walnut Street on either side of Broad Street, with another high area still at 9th and South Street:

500 foot buffers with more than 25 bicycle thefts from 2007-2012.

500 foot buffers with more than 25 bicycle thefts from 2007-2012.


Advanced Clustering

What if the location with the highest amount of thefts isn’t a discrete, arbitrary circle somewhere in the city, but rather a contiguous area with similar charactersistics? Cluster analysis looks for statistically significant and contiguous clusters of areas with similar values. When we run run cluster analysis on the theft-by-block file, using inverse manhattan distance, we get this:

This map shows three kinds of clusters: Clusters where high amounts of thefts happened in a block surrounded by low-theft blocks (HL), high amounts of thefts happened in a block surrounded by other high-theft blocks (HH), and low amounts of thefts happened in areas surrounded by high-theft blocks.

Welp. Better lock up your bikes, Philadelphians.

Additional Considerations

Other things to consider: Time: All of these operations used all six years’ worth of data. What if I’d just used 2010? 2012? The clusters, hotspots, and prime locations would probably be entirely different! All of this map data was calculated using a projected coordinate system, North American Datum 1983 State Plane Pennsylvania South. In plain English, we chose to use a warped measurement system that would reduce distortion and preserve certain attributes (distance, area) at the local level. The default Tableau map projection is Web Mercator, which is a projection system used to make 256×256 pixel tiles, and it distorts accuracy the further away from the equator the measurements are being taken. This might account for Greg’s cluster of 15 points at a small location that I couldn’t replicate. Another thing to consider is geocoding inaccuracy.  If many addresses were unable to be geocoded to the exact address, they might have defaulted to city name (Philadelphia), which could also account for multiple locations falling at the exact same point near City Hall (perhaps a commonly used location for geocoding to “Philadelphia”).

Cartography is both an art and a science. The decisions the cartographer makes hugely inform the results. With the popularity of easy web-mapping tools like CartoDB, and other tools that provide light mapping capabilities like Tableau, it’s more important than ever to be a discerning producer and consumer of cartography.