Azavea Labs

Where software engineering meets GIS.

Building Districts in Web-Time

DistrictBuilder logoMost recently, the Politics, Redistricting and Elections team has been working closely with the Public Mapping Project to build DistrictBuilder, an open source, web-based application that enables regular citizens to use powerful tools to draw their own legislative districts. If you’ve seen how badly the professionals can mangle districts (Exhibit A, Exhibit B, etc), it’s easy to imagine that any given citizen, given the right tools, could do it better.

We spent quite a bit of time making the application easy to use and responsive in modern desktop web browsers.  The “easy to use” part was tackled by our excellent UI/UX design team. The “responsive” part was the domain of  our engineers.  That’s where the fun began for me.

DistrictBuilder is designed to use any polygon shapefile, transform it into an internal data model, then make that accessible via map tiles and geometric features.  When serving map tiles, we use GeoServer and GeoWebCache to generate the tiles and cache them, respectfully. This performance is great — pre-generated map tiles are the best we can aim for with respect to the base map tiles. Serving geometric features at full resolution, however, introduces a slew of problems. A few that stood out right away:

  • Web Browser Limitations — 9 out of 10 experts agree: too many map features has a significant performance impact on web browsers, with the greatest impact on the Microsoft Internet Explorer browser.
  • Excessive Coordinates — delivering lots of polygon coordinate pairs that the user may never see consumes valuable bandwidth and rendering time.
  • Server Processing Time — recalculating state-wide geometric features consumes valuable CPU time.

Web Browser Limitations

First, we tackled the browser performance issues. A sluggish browser is the kiss of death in the web world, and we had to make the application experience as fast as possible before looking at the server processing time.

We originally gave users the power to create highly detailed districts at the statewide level, but realized that no modern web browser could handle the volume of polygon features that would need to be served to represent an entire state.  In order to mitigate this limitation, we limited the size and number of features sent to the browser. With some scale-dependent logic, a user zoomed in to a detail of a district can finely tune the boundary by moving smaller geographic features (e.g. census blocks), and a user zoomed out to the state-wide level can manipulate the districts by moving large geographic features only (e.g. counties). In addition, when editing the finest details, we limit the number of features a user can move in a single edit.

Excessive Coordinates

The next thing to go was the set of full resolution geometries. In DistrictBuilder, users never actually see the full geometries, but an adaptively simplified (sometimes called generalized) geometry; depending on the scale of the map view, the server will deliver geometries with appropriate coordinate resolutions. Simply put: as you zoom in on the map, you get more detail in the geometries.

By simplifying counties, the geometries are reduced from 166,958 points to 4,821. When a user is zoomed out, there is no noticeable difference between these geometries!  However, as the user is interacting with higher resolution maps, DistrictBuilder loads in higher-resolution geometries on demand. The following images demonstrate the difference in the geometry detail:

Low Resolution Transition

The zoomed in County layer, with a low resolution district overlay (orange line). There are currently 1,414 coordinates in this view of the district overlay.

High Resolution Transition

The zoomed in VTD layer, with a high resolution district overlay (orange line). There are currently 3,253 coordinates in this view of the district overlay.

You can notice the differences in the district detail if you look closely at the orange district boundary. This transition happens seamlessly in the application, loading in the higher resolution geometries as web users zoom in to areas of interest.

We also eliminated coordinates that you never see.  It made no sense to serve  coordinates that were located in the opposite side of the state where a user was editing, just like you wouldn’t expect to get an encyclopedia in the mail when releasing an RFI. With the OpenLayers library, Strategies came in handy here, particularly BBOX.

Server Processing Time

After we had optimized the performance of the user interface, we shifted our focus to the server-side processing.  One of the features that makes DistrictBuilder such a powerful tool is the accuracy of the underlying data and constant feedback of important district statistics. In order to calculate all these statistics on the fly, it is necessary to leverage some tricks already mentioned with respect to map tiles: caching and generalizing.

Computation of the district statistics must happen every time a district boundary is changed. A naive solution to this problem would be to aggregate the values within the boundary every time a change is made.  This approach results in horrible performance. Instead, we just determine what has changed — which areas were added, which areas were removed — and recompute the delta, or change, on the previous district value.

Another trick to optimizing performance is in the way we determine the changing boundaries.  I’ll describe the problem using the census geographies of counties, tracts, and blocks. The structure and detail of the underlying data yielded computationally expensive queries against the block geometries.  We came up with a method of searching for the geographies in a hierarchical fashion — searching the counties first, then continuing to the next smallest-scale geography only if there was any remaining geometry left in the query.  We did the same for the tracts, and took a shortcut at the block level to exclude the block geometries.  This increased server side performance considerably.

King William County

King William county is comprised of 22 Voter Tabulation Districts and 1,527 Census Blocks.

Consider the following scenario: a user wants to move King William County (highlighted in yellow) from District 1, which is over populated, to District 3, which is under populated. Changing the boundaries with all the blocks in King William County would require testing at least 4,000 blocks for spatial intersections, then aggregating 1,527 data values, and recomputing the spatial aggregate (union) of those 1,527 geometries. With our hierarchical approach, we can change the boundary of the district with the county boundary, and change the population totals by the county’s population. A few orders of magnitude fewer operations to perform, and much faster from the user’s perspective.

Lessons Learned

Throughout the DistrictBuilder development process, the same core performance challenge has arisen: the volume of data must be reduced. This applies to all aspects of the application:

  • Map Tiles: pre-render tiles to keep the number of rendered tiles to a minimum at runtime.
  • Map Features: deliver to the browser only as much information as you can see (perhaps even less).
  • Database Queries: do anything possible to ensure that geometric operations are performed on simplified geometries.
  • Aggregating Statistics: cache whatever you can, and only compute the difference from the last cache state.

The above steps reduced the sheer number of operations and volume of processing that both the server and browser need to complete when creating new districts. These are lessons that translate well to any “big data” problem, and are crucial in bringing sophisticated GIS operations to the web.

Pending edit system using Django

A common concern when we talk to people about OpenTreeMap is how much to trust the public with an organisation’s tree inventory. Every implementation of this open source system has a different answer. The original site, UrbanForestMap.org, allows a logged-in user to edit almost every bit of information they gather about a tree. PhillyTreeMap.org requires a certain level of reputation before a user can edit everything, but even a new user has considerable edit capabilities. The most recent implementation (still a work-in-progress) introduces a bit of oversight to public edits. The managing group wanted to double-check changes to officially inventoried trees, but didn’t want to get in the way of people adding and editing their own trees.

Lets look at how this changes the user story first:

A logged-in user makes an edit to a tree. The system needs to decide if these changes are applied to the tree or placed in a pending queue. If this is a publicly-entered tree, the changes are applied to the tree. (Start new requirements) If this is an inventory tree, and the user isn’t a member of a management group, add the change to the pending queue. Display any pending changes reasonably near the appropriate current value. (End new requirements)

Most of this happens behind the scenes in the saving logic. I added a bit of code to the top of our tree updater to check if the pending system is active, the user’s permissions and the tree’s origins. If everything checks out, the change goes straight into the updater code. For changes that go into the pending queue, the path to becoming an official change is a little more tortuous.

Since we’re storing these changes for later review, they have to go into the database. I created a new table to hold onto the original tree’s id, the field being changed and the new value as well as the user who submitted it, a date/time stamp and a status field. Each pending change is stored separately; even if the user makes more than one change to the tree, each ‘pend’ can be applied individually.

The rest of the pending system is eye-candy and a bit of slightly tedious templating. Almost every field on a tree’s detail page now needs to check two new things: are there any pending changes for this field, and does this user have permission to approve/disapprove pending edits. If there are pending edits, the new values are added below the current official value. When a managing user views the page, small approve and disapprove buttons also appear next to each pending change. Throw in a management-access-only page for some bulk evaluation and the system is complete!

Bring on the data focused basemaps, Esri

It’s great to read that Esri is working on features and basemaps to include within ArcGIS.com to support data visualization.     Sometimes the map isn’t the focus; sometimes the data is the focus.  A few weeks ago, Bern Szukalski wrote an article for the Esri Insider blog that spoke about Esri’s efforts to create new basemaps including basemaps for data visualization purposes.   I think this is a great move for Esri.   Last fall, I suggested such a muted basemap via the ideas.arcgis.com portal, so I was quite excited to hear it is in the works.

A post today, also by Bern, explained a new feature within ArcGIS.com to allow the user to mute the basemap by adjusting it’s display transparency.    The HunchLab team stumbled upon this idea a few months back and it’s been a great way to use the existing topographic basemap.    In our demo instance of HunchLab, we are using the ArcGIS.com Topographic tiles set to a transparency of 60%.   You can see what it looks like below.

Kudos to Esri — keep the basemap options coming!