Most recently, the Politics, Redistricting and Elections team has been working closely with the Public Mapping Project to build DistrictBuilder, an open source, web-based application that enables regular citizens to use powerful tools to draw their own legislative districts. If you’ve seen how badly the professionals can mangle districts (Exhibit A, Exhibit B, etc), it’s easy to imagine that any given citizen, given the right tools, could do it better.
We spent quite a bit of time making the application easy to use and responsive in modern desktop web browsers. The “easy to use” part was tackled by our excellent UI/UX design team. The “responsive” part was the domain of our engineers. That’s where the fun began for me.
DistrictBuilder is designed to use any polygon shapefile, transform it into an internal data model, then make that accessible via map tiles and geometric features. When serving map tiles, we use GeoServer and GeoWebCache to generate the tiles and cache them, respectfully. This performance is great — pre-generated map tiles are the best we can aim for with respect to the base map tiles. Serving geometric features at full resolution, however, introduces a slew of problems. A few that stood out right away:
- Web Browser Limitations — 9 out of 10 experts agree: too many map features has a significant performance impact on web browsers, with the greatest impact on the Microsoft Internet Explorer browser.
- Excessive Coordinates — delivering lots of polygon coordinate pairs that the user may never see consumes valuable bandwidth and rendering time.
- Server Processing Time — recalculating state-wide geometric features consumes valuable CPU time.
Web Browser Limitations
First, we tackled the browser performance issues. A sluggish browser is the kiss of death in the web world, and we had to make the application experience as fast as possible before looking at the server processing time.
We originally gave users the power to create highly detailed districts at the statewide level, but realized that no modern web browser could handle the volume of polygon features that would need to be served to represent an entire state. In order to mitigate this limitation, we limited the size and number of features sent to the browser. With some scale-dependent logic, a user zoomed in to a detail of a district can finely tune the boundary by moving smaller geographic features (e.g. census blocks), and a user zoomed out to the state-wide level can manipulate the districts by moving large geographic features only (e.g. counties). In addition, when editing the finest details, we limit the number of features a user can move in a single edit.
The next thing to go was the set of full resolution geometries. In DistrictBuilder, users never actually see the full geometries, but an adaptively simplified (sometimes called generalized) geometry; depending on the scale of the map view, the server will deliver geometries with appropriate coordinate resolutions. Simply put: as you zoom in on the map, you get more detail in the geometries.
By simplifying counties, the geometries are reduced from 166,958 points to 4,821. When a user is zoomed out, there is no noticeable difference between these geometries! However, as the user is interacting with higher resolution maps, DistrictBuilder loads in higher-resolution geometries on demand. The following images demonstrate the difference in the geometry detail:
You can notice the differences in the district detail if you look closely at the orange district boundary. This transition happens seamlessly in the application, loading in the higher resolution geometries as web users zoom in to areas of interest.
We also eliminated coordinates that you never see. It made no sense to serve coordinates that were located in the opposite side of the state where a user was editing, just like you wouldn’t expect to get an encyclopedia in the mail when releasing an RFI. With the OpenLayers library, Strategies came in handy here, particularly BBOX.
Server Processing Time
After we had optimized the performance of the user interface, we shifted our focus to the server-side processing. One of the features that makes DistrictBuilder such a powerful tool is the accuracy of the underlying data and constant feedback of important district statistics. In order to calculate all these statistics on the fly, it is necessary to leverage some tricks already mentioned with respect to map tiles: caching and generalizing.
Computation of the district statistics must happen every time a district boundary is changed. A naive solution to this problem would be to aggregate the values within the boundary every time a change is made. This approach results in horrible performance. Instead, we just determine what has changed — which areas were added, which areas were removed — and recompute the delta, or change, on the previous district value.
Another trick to optimizing performance is in the way we determine the changing boundaries. I’ll describe the problem using the census geographies of counties, tracts, and blocks. The structure and detail of the underlying data yielded computationally expensive queries against the block geometries. We came up with a method of searching for the geographies in a hierarchical fashion — searching the counties first, then continuing to the next smallest-scale geography only if there was any remaining geometry left in the query. We did the same for the tracts, and took a shortcut at the block level to exclude the block geometries. This increased server side performance considerably.
Consider the following scenario: a user wants to move King William County (highlighted in yellow) from District 1, which is over populated, to District 3, which is under populated. Changing the boundaries with all the blocks in King William County would require testing at least 4,000 blocks for spatial intersections, then aggregating 1,527 data values, and recomputing the spatial aggregate (union) of those 1,527 geometries. With our hierarchical approach, we can change the boundary of the district with the county boundary, and change the population totals by the county’s population. A few orders of magnitude fewer operations to perform, and much faster from the user’s perspective.
Throughout the DistrictBuilder development process, the same core performance challenge has arisen: the volume of data must be reduced. This applies to all aspects of the application:
- Map Tiles: pre-render tiles to keep the number of rendered tiles to a minimum at runtime.
- Map Features: deliver to the browser only as much information as you can see (perhaps even less).
- Database Queries: do anything possible to ensure that geometric operations are performed on simplified geometries.
- Aggregating Statistics: cache whatever you can, and only compute the difference from the last cache state.
The above steps reduced the sheer number of operations and volume of processing that both the server and browser need to complete when creating new districts. These are lessons that translate well to any “big data” problem, and are crucial in bringing sophisticated GIS operations to the web.