Azavea Labs

Where software engineering meets GIS.

Exporting Django Querysets to CSV

At Azavea, we have numerous client projects that must provide exportable data in CSV format. The reasons for this range from simple data exchange to complex export-modify-import workflows. In order to make this process easier for django projects, we made a simple utility, django-queryset-csv, for exporting django querysets directly to HTTP responses with CSVs attached.

7gr69

So you have something like this:

installation:

pip install django-queryset-csv

models.py:

views.py:

Pain Points

Why bother? Can’t you write a for-loop to export a CSV in a dozen or so lines of code? It turns out there are a few pain points we run into over and over again with CSV exports:

  • We’re currently using python 2.7 for our django projects, where the provided CSV library has poor support for unicode characters. This has to be addressed somehow, usually by utf-8 encoding python strings before writing them to the CSV.
  • Adding a BOM character to CSVs with utf8 encoding is required for them to open properly in Microsoft Excel.

These are delicate behaviors that we prefer to have handled by a dedicated library with adequate unit test coverage.

Utilities

In addition, we found ourselves repeatedly wanting the same helper utilities, and to have them work together consistently and predictably.

  • The ability to generate a filename automatically based on the underlying data.

  • The ability to generate timestamps for each export

  • The ability to generate column headers from the underlying data, sane defaults, custom overrides, or some flexible combination of the three.




    In this case, field_header_map takes precedence if provided, followed by verbose_name if specified and not disabled, followed finally by the underlying model field names.

Advanced Usage

Moving this library into production, we quickly discovered some advanced features that we needed the library to support.

Foreign Keys

The most obvious is foreign key fields. This is supported using the .values() method of a queryset, which is able to walk relationships in the same fashion as other ORM directives, using double-underscores. Note that you can’t make use of verbose names in this case, but you can provide custom overrides in the field_header_map argument:

Asynchronous Export

Sometimes, you can’t return a CSV response in the same thread. We ran into this problem because sometimes CSV exports, or the queries that produce them, take too long for the request/response cycle, and tying down a web worker for that period is unnacceptable. For this case, django-queryset-csv provides a lower-level function for writing CSV data to a file-like object of any kind:

Final Thoughts

We’re using django-queryset-csv in production, but the project remains in beta. We hope it’ll make CSV exports a little less painful for you too. Please report all bugs via github.

Geoserver Timestamp Styling and PostgreSQL DateTime Fields

Think for a moment about how many ways we talk about dates and time. In regular language, we can say things like “last Tuesday” or “week after the week after next” or “Friday the 12th” and people generally can figure out when we’re talking about. Add time into the mix and phrases start getting strange: “noon-thirty” and “half-past midnight” are my favorite oddities.

The actual date-time standard is published as ISO 8601 and talks about what date-time data should look like, but programming languages, databases and other programs implement this standard with sometimes vastly different interfaces and rules.

PostgreSQL’s handling of date-time data takes the form of 6 different field types, all suited to different data needs. For the OpenTreeMap project, we’re using the timestamp data type which allows us to store both dates and times for events down to the microsecond.

Geoserver accepts a single date-time type (also called timestamp) that stores dates and times at the same resolution as PostgreSQL’s data type, so it reads PostgreSQL’s date-time data easily. Adding a date filter to Geoserver’s SLD files is also fairly easy so long as you know how Geoserver wants date-time data to be formatted. The various formats that Geoserver knows how to interpret are located here. So, if we have a field in a database table called “last_updated”, then an example SLD filter might look like this:

<Filter>
  <PropertyIsLessThan>
     <PropertyName>last_updated</PropertyName>
     <Literal>2012-01-01 00:00:00</Literal>
  </PropertyIsLessThan>
</Filter>

This filter would display any updates made before midnight on January 01, 2012. Another example (from the OGC 1.0 encoding specification’s examples) catches updates between certain dates:

<ogc:Filter>
  <ogc:PropertyIsBetween>
    <ogc:PropertyName>last_updated</ogc:PropertyName>
    <ogc:LowerBoundary>
      <ogc:Literal>2011-12-01 00:00:00</ogc:Literal>
    </ogc:LowerBoundary>
    <ogc:UpperBoundary>
      <ogc:Literal>2011-12-31 23:59:59</ogc:Literal>
    </ogc:UpperBoundary>
  </ogc:PropertyIsBetween>
</ogc:Filter>

This filter would display any updates made during the month of December, 2011. Anything that was updated outside the specified date and time parameters would bypass these filters and go on to check any other rules in the SLD file. This works great if we’re only interested in static date comparisons. But what if we want to see updates less than a week old? Or objects that haven’t been updated for more than three months? This kind of dynamic filtering is a little harder to do.

After digging through the Geoserver documentation and a lot of googling, we decided to shift the dynamic part of the filter into a PostgreSQL view. Our test view included an id field, a geometry field and a field that calculates the number of days since the last update to that object. The view sql looks like this:

CREATE OR REPLACE VIEW timestamp_test AS
  SELECT
    treemap_tree.id,
    date_part('days'::text, now() - treemap_tree.last_updated) AS days,
    treemap_tree.geometry
  FROM treemap_tree;

So now we can add this view to Geoserver as a source layer and it will see the new days field as a static number field. We can use any of the property filters on this field and style recently added data in a dynamic fashion.

Introducing python-sld and django-sld

python-sld

python-sld is a simple python library that enables some basic manipulation of StyledLayerDescriptor (SLD) documents.

What are SLD documents?  SLD is a standard defined by the Open Geospatial Consortium, or OGC. In their words:

The OpenGIS® Styled Layer Descriptor (SLD) Profile of the OpenGIS® Web Map Service (WMS) Encoding Standard defines an extends the WMS standard to allow user-defined symbolization and coloring of geographic feature and coverage data.

In layman’s terms, SLD is a common way to style your own maps that come from any map server that speaks WMS (another standard by OGC). Of all the GIS tools available, the WMS server ecosystem is exceptionally rich and diverse. There are many proprietary choices, as well as a plethora of open source options.

State of the Art

Recently in the course of developing new features for DistrictBuilder, we arrived at a point where we needed to generate SLDs dynamically. Looking around at the existing python libraries, we examined:

What we were looking for was a pure object model access to components in the SLD, as well as XML validation, with very few dependencies. None of the above projects really fit the bill, so we started working on our own.

Introducing python-sld

python-sld in an open source (Apache 2.0) library for dynamic SLD creation and manipulation. The project is hosted over on github, and the packages are in pypi (including generated inline documentation).

Features

Width python-sld, creating new SLD documents is as easy as creating a new instance of a StyledLayerDescriptor object:

>>> from sld import *
>>> sld_doc = StyledLayerDescriptor()

With this SLD document, all descendants are accessed as properties, and most child objects are created off the parent with “create_xxx()” methods:

>>> sld_doc.NamedLayer is None
True
>>> nl = sld_doc.create_namedlayer('My Layer')
>>> nl.Name
'My Layer'

For most complex types, the parent’s property is an instance of the class. In our example:

>>> isinstance(nl, NamedLayer)
True
>>> us = nl.create_userstyle()
>>> us.Title = 'Style Title'
>>> us.Title
'Style Title'
>>> isinstance(us, UserStyle)
True

A couple pythonic classes break up the monotony, too. For elements that contain collections of items (a FeatureTypeStyle element may contain many Rule elements, and Fill, Stroke, and Font elements may contain many CssParameter elements), they behave as pythonic lists.

>>> fts = us.create_featuretypestyle()
>>> len(fts.Rules)
0
>>> r1 = fts.create_rule('Criteria 1')
>>> len(fts.Rules)
1
>>> fts.Rules[0].Title == r1.Title
True

Another bit of pythonic syntactic sugar is the combination of Filters. By constructing filters (with the Rule as a parent) and combining them with “+” or “|”, they create logical “AND” and “OR” filters, respectively.

>>> f1 = Filter(r1)
>>> f1.PropertyIsGreaterThan = PropertyCriterion(f1, 'PropertyIsGreaterThan')
>>> f1.PropertyIsGreaterThan.PropertyName = 'number'
>>> f1.PropertyIsGreaterThan.Literal = '-10'
>>>
>>> f2 = Filter(r1)
>>> f2.PropertyIsLessThanOrEqualTo = PropertyCriterion(f2, 'PropertyIsLessThanOrEqualTo')
>>> f2.PropertyIsLessThanOrEqualTo.PropertyName = 'number'
>>> f2.PropertyIsLessThanOrEqualTo.Literal = '10'
>>>
>>> r1.Filter = f1 + f2

When the SLD object is serialized, it will render an “ogc:And” element that contains both property comparisons. You may have noticed that both the “PropertyIsGreaterThan” and “PropertyIsLessThanOrEqualTo” properties are assigned an instance of a PropertyCriterion class. This is the common class for all property comparitors. The name of the comparitor determines it’s logical comparison (less than, greater than, equal to, etc.), and the class has a PropertyName and Literal property, to control which property gets compared, and which value it is compared against.

Finally, serialization is performed on the main StyledLayerDescriptor object, with options to ‘prettify’ the output:

>>> content = sld_doc.as_sld(pretty_print=True)

Dependencies

The lxml library is required by python-sld. This is the library that provides the underlying parsing and serializing of the XML document, as well as the validation steps against the canonical SLD schema.

Limitations

At the current time, only a subset of the entire SLD specification is implemented. All SLD elements are parsed and stored, but only the following elements may be manipulated as objects in python-sld:

  • StyledLayerDescriptor
  • NamedLayer
  • Name (of NamedLayer)
  • UserStyle
  • Title (of UserStyle and Rule)
  • Abstract
  • FeatureTypeStyle
  • Rule
  • ogc:Filter (implicit ogc:And and ogc:Or)
  • ogc:PropertyIsNotEqualTo
  • ogc:PropertyIsLessThan
  • ogc:PropertyIsLessThanOrEqualTo
  • ogc:PropertyIsEqualTo
  • ogc:PropertyIsGreaterThanOrEqualTo
  • ogc:PropertyIsGreaterThan
  • ogc:PropertyIsLike
  • ogc:PropertyName
  • ogc:Literal
  • PointSymbolizer
  • LineSymbolizer
  • PolygonSymbolizer
  • TextSymbolizer
  • Mark
  • Graphic
  • Fill
  • Stroke
  • Font
  • CssParameter

All other SLD elements cannot be directly manipulated in python-sld, but are accessible (from a parsed SLD that is perhaps more complex) via the parent object’s _node property. This is the lxml.Element that the python-sld class represents.

django-sld

django-sld builds upon the capabilities in python-sld by enabling quick SLD generation from geographic models. This library is separate from the python-sld library because of the dependencies on django and pysal, the Python Spatial Analysis Library.

Primer on Geographic Models

I gave a quick background to geographic models in django to the Boston django meetup last week, and the slides of my presentation are available online as a presentation in Google Docs. The slides are embedded here for your convenience:

Introducing django-sld

django-sld is an open source (Apache 2.0) library for generating SLD documents from geographic querysets. The project is hosted over on github, and the packages are in pypi (including generated inline documentation).

Features

django-sld enables quick classification of geographic querysets by passing the data distribution of an individual model field into the classification algorithms built into pysal. Not all classification methods in pysal are available, however. At the current version (1.0.3), the following classification algorithms are supported:

  • Equal Interval
  • Fisher Jenks
  • Jenks Caspall
  • Jenks Caspall Forced
  • Jenks Caspall Sampled
  • Max P Classifier
  • Maximum Breaks
  • Natural Breaks
  • Quantiles

To classify a django queryset, use any of the as_xxx() methods in the djsld.generator module.

>>> from djsld import generator
>>> qs = MySpatialModel.objects.all()
>>> sld = generator.as_quantiles(qs, 'population', 10)

The above example assumes that you have a model named “MySpatialModel” in django’s models.py file. The result is a sld.StyledLayerDescriptor object, which may be serialized to a string with “as_sld()”

>>> sld_content = sld.as_sld(pretty_print=True)

The “pretty_print” option is available to format the SLD in a fashion that is more readable by us humans.

In addition to simple models, django’s support for related fields really shines, as it’s possible to classify the distribution on any related field, using the “__” (double underscore) format preferred by django:

>>> sld = generator.as_quantiles(qs, 'city__population', 10)

The one caveat is that the PropertyName in the criteria will be set to this field name (which is not the way most mapping packages refer to related fields). To accommodate this difference, you may use the ‘propertyname’ keyword to control the output PropertyName:

>>> sld = generator.as_quantiles(qs, 'city__population', 10,
... propertyname='population')

Dependencies

django-sld requires python-sld and the pysal library.

Django, contests and weekly voting

I’ve written before about how OpenDataPhilly uses a ratings module to drive a nomination system. Recently, we added a contest to the site to determine what kinds of data local non-profits and the public would like to see made available. Contests generally have a winner and, in this case, we’re letting the public vote on data sets nominated by non-profits. At first glance this isn’t much different from our current nomination system, but there’s one catch; we wanted users to be able to vote for one entry once a week. Turns out this was more novel than it sounds.

Django has a few modules for rating or voting on content, one of which we’re using for the nomination and comments systems. The inner-workings of the module boil down to the following rules:

  1. A user must be logged in to rate/vote
  2. A user can rate/vote for any number of items
  3. A user can only rate/vote for any particular item once (though they may change their rating/vote later)

Compare this with the rules we wanted to enforce for the contest:

  1. A user must be logged in to vote
  2. A user can only vote once per 7 day period
  3. A user can vote for an item multiple times, so long as rule 2 is preserved

Aside from the first rule, we were trying to do almost exactly the opposite of what our rating module enforced. Rather than retrofit the existing module to allow additional and sometimes contradictory behaviour, we decided to write a very small voting module of our own.

The code revolves around two decision points: is voting allowed and can a specific user vote now. The first question is answered by the contest object itself. A contest knows when it’s starting and ending date are, so if today is after the start date and before the end date, then voting is allowed.

The second question is a bit more complicated, but not by much. Because of rule 2 above, we need to know when a user last voted to know if they’re currently allowed to vote. The database storage for a vote contains a datetime object, a foreign key to the user object and a foreign key to the contest entry so if we sort a user’s votes by time we can retrieve their latest vote.

def user_can_vote(self, user):
    increment = datetime.timedelta(days=7)
    votes = user.vote_set.order_by('-timestamp') #latest on top
    if votes:
        next_date = votes[0].timestamp + increment
        if datetime.datetime.today() < next_date and dt.today() < contest.end_date:
            return False
    return True

The above code gets a user’s votes and orders them by time with the most recent first. If a user has ever voted, we need to check if they’re allowed to vote again yet or if they have to wait. We calculate the earliest time that a user can vote next and check it against the date and time now. We also check the end of the contest against the date and time now. If “now” is before the next time the user can vote or “now” is after the contest’s end date, we return false; the user can’t vote now. If a user has never voted before, or the dates are all ok then the user can vote. This check is done after a user clicks the “vote” button but before a vote is saved to the database. We also display a message saying why this check failed and when a user will be able to vote again.

So we’re taking advantage of all of the spam protection built into Django’s user registration process and running a contest on surprisingly little code: 3 database tables, 200 lines of python (blank lines included) and a few templates is all we needed!

Restricting Zoom with Multiple OL Basemaps

DistrictBuilder logoAs David recently posted, our team has been hard at work implementing DistrictBuilder, where we’ve been investing a great deal of effort on both performance and usability. One feature we added in the spring was the addition of basemaps to the user interface. Before this addition, users labored over drawing the perfect district configurations without a whole lot of context of the surrounding environment (e.g. roads, water boundaries, etc.). When the time came to add a basemap to the application, it didn’t feel right restricting it to a single type of map, or even a single provider. We wanted to allow for users to have the choice to select the best map for the task at hand. Could an application promoting democracy really have it any other way?

We set out to support several base map options as well as any combination of options, including:

  • Bing Maps (satellite, roads and hybrid)
  • GoogleMaps (satellite, roads and hybrid)
  • ArcGIS Online (any of several maps)
  • OpenStreetMap

Since DistrictBuilder needs to be flexible enough to meet the needs of users and administrators in a variety of situations, we decided on a two step approach to basemap configuration. First, the administrator specifies, in the configuration file, which of the combinations of map providers and map types are allowed to be selected. Then DistrictBuilder presents all of the configured options to the user, where they can be toggled among at any time while a plan is being viewed or edited.

Here’s an example of an instance configured with an OpenStreetMap road layer, a Bing hybrid layer, and a Google satellite layer:

Road, Hybrid, and Satellite

Here’s another example with only road layers — one for each of the three configured providers:

Roads for three vendors

DistrictBuilder currently allows the configuration of basemaps using permutations of each of the three vendors and three map types described above. Adding more options is a relatively easy task, however. With the launch of Fix Philly Districts, we wanted the basemap colors to be slightly more muted than the above options, and ended up adding support for the ArcGIS Online World Topographic Map. We also experimented with the Google Maps V3 custom styling API, which looked great, but introduced performance problems when panning and zooming (animations).

There were, of course, some hoops that needed to be jumped through in order to get all of these basemaps behaving correctly on the same map, which will be discussed below. I’ve extracted the logic required to do so into a small demo that can be viewed/downloaded here. The demo has also been embedded into this post, and can be interacted with without going anywhere:

Zoom Levels

Many of the challenges that needed to be overcome to get this working correctly were brought about because we needed to restrict the zoom levels to the area at hand. We wanted to eliminate superfluous zoom levels to ensure the user was always operating within the appropriate boundaries (note: it is not done in this demo, but in DistrictBuilder we also restrict the extent with the ‘restrictedExtent’ map parameter, so users can’t even pan outside of the area).

One difficulty with setting zoom levels on the different layers is that the layers don’t use zoom parameters consistently. In Bing (the VirtualEarth layer), minZoomLevel and maxZoomLevel are needed. In Google, minZoomLevel is needed, but it requires numZoomLevels instead of maxZoomLevel. And in OpenStreetMap (the OSM layer), well…no combination of those seem to work — we needed to slightly modify the XYZ layer (OSM’s base class) to allow maxResolution to be changed based on the minZoomLevel. To see how this is done, view the demo source. With that change in place, the list of required layer parameters is as follows:

  • Bing – minZoomLevel, maxZoomLevel, projection, sphericalMercator, maxExtent
  • Google – numZoomLevel, minZoomLevel, projection, sphericalMercator, maxExtent
  • OpenStreetMap - numZoomLevel, minZoomLevel, projection

Coordinate Systems

We also faced some problems related to coordinate systems. DistrictBuilder uses GeoServer and GeoWebCache to serve up WMS layers. The coordinate system of our data is one version of the the ever-changing “Popular Visualization CRS / Mercator” projection. We needed to match up the OpenLayers projection to the one used on our data, or else we were seeing slight offsets on our overlays. Unfortunately, the ‘projection’ layer parameter isn’t always used within the layers correctly. For example, any layer using the SphericalMercator class gets its projection automatically hardcoded to 900913. We needed to make a slight modification to the SphericalMercator class to allow the ‘projection’ parameter to carry through. This can be seen by viewing the demo source.

Bonus: Math Time!

One interesting part about implementing zoom restriction was that we needed it to work in any instance of DistrictBuilder — from large states to small towns, which may have vastly different extents. Instead of having an administrator figure out the proper minimum zoom level, we calculate it automatically based on the extent, which requires a little bit of basic algebra.

For Philadelphia, the extent of our area is:

[-8397913.926216, 4842467.609439, -8329120.600772, 4895973.529229]

In DistrictBuilder, we calculate this dynamically on the server side (using Django) by filtering all of the geounits in the database and calling the ‘extent’ function on the query set. For the demo, this is hardcoded. Here’s how to transform this extent into a Spherical Mercator minZoomLevel:

  • Find the width of the area in meters.
var studyWidthMeters = extent[2] - extent[0];
  • Find the width of the map in pixels. In the demo, this is hardcoded, because we are setting the div size of the map. In DistrictBuilder, the map takes up the whole screen, and this value is calculated on the fly based on the size of the div in which the map occupies.
var mapWidthPixels = 450;
  • Find the map resolution, or meters per pixel.
var resolution = studyWidthMeters / mapWidthPixels;
  • Find the maximum map resolution. In Spherical Mercator, the maximum resolution is one 256×256 tile taking up the entire circumference Earth. So dividing the circumference of the earth (~40,000km) by 256 gives us the maximum meters per pixel, which is a constant.
var maxResolution = 156543.033928;
  • Spherical Mercator zoom levels work like a pyramid. Each zoom breaks the current tile up into a 2×2 group of 256×256 tiles, essentially halving the resolution each time. Therefore, finding the resolution at a given zoom level looks like this:
maxresolution / 2^zoom = resolution
  • We know the resolution and max resolution already, and need to find the zoom:
zoom = log(maxresolution/resolution)/log(2)
  • Or in javascript:
var minZoom = Math.log(maxResolution / resolution) / Math.LN2;