Azavea Labs

Where software engineering meets GIS.

Geoserver Timestamp Styling and PostgreSQL DateTime Fields

Think for a moment about how many ways we talk about dates and time. In regular language, we can say things like “last Tuesday” or “week after the week after next” or “Friday the 12th” and people generally can figure out when we’re talking about. Add time into the mix and phrases start getting strange: “noon-thirty” and “half-past midnight” are my favorite oddities.

The actual date-time standard is published as ISO 8601 and talks about what date-time data should look like, but programming languages, databases and other programs implement this standard with sometimes vastly different interfaces and rules.

PostgreSQL’s handling of date-time data takes the form of 6 different field types, all suited to different data needs. For the OpenTreeMap project, we’re using the timestamp data type which allows us to store both dates and times for events down to the microsecond.

Geoserver accepts a single date-time type (also called timestamp) that stores dates and times at the same resolution as PostgreSQL’s data type, so it reads PostgreSQL’s date-time data easily. Adding a date filter to Geoserver’s SLD files is also fairly easy so long as you know how Geoserver wants date-time data to be formatted. The various formats that Geoserver knows how to interpret are located here. So, if we have a field in a database table called “last_updated”, then an example SLD filter might look like this:

<Filter>
  <PropertyIsLessThan>
     <PropertyName>last_updated</PropertyName>
     <Literal>2012-01-01 00:00:00</Literal>
  </PropertyIsLessThan>
</Filter>

This filter would display any updates made before midnight on January 01, 2012. Another example (from the OGC 1.0 encoding specification’s examples) catches updates between certain dates:

<ogc:Filter>
  <ogc:PropertyIsBetween>
    <ogc:PropertyName>last_updated</ogc:PropertyName>
    <ogc:LowerBoundary>
      <ogc:Literal>2011-12-01 00:00:00</ogc:Literal>
    </ogc:LowerBoundary>
    <ogc:UpperBoundary>
      <ogc:Literal>2011-12-31 23:59:59</ogc:Literal>
    </ogc:UpperBoundary>
  </ogc:PropertyIsBetween>
</ogc:Filter>

This filter would display any updates made during the month of December, 2011. Anything that was updated outside the specified date and time parameters would bypass these filters and go on to check any other rules in the SLD file. This works great if we’re only interested in static date comparisons. But what if we want to see updates less than a week old? Or objects that haven’t been updated for more than three months? This kind of dynamic filtering is a little harder to do.

After digging through the Geoserver documentation and a lot of googling, we decided to shift the dynamic part of the filter into a PostgreSQL view. Our test view included an id field, a geometry field and a field that calculates the number of days since the last update to that object. The view sql looks like this:

CREATE OR REPLACE VIEW timestamp_test AS
  SELECT
    treemap_tree.id,
    date_part('days'::text, now() - treemap_tree.last_updated) AS days,
    treemap_tree.geometry
  FROM treemap_tree;

So now we can add this view to Geoserver as a source layer and it will see the new days field as a static number field. We can use any of the property filters on this field and style recently added data in a dynamic fashion.

Django, contests and weekly voting

I’ve written before about how OpenDataPhilly uses a ratings module to drive a nomination system. Recently, we added a contest to the site to determine what kinds of data local non-profits and the public would like to see made available. Contests generally have a winner and, in this case, we’re letting the public vote on data sets nominated by non-profits. At first glance this isn’t much different from our current nomination system, but there’s one catch; we wanted users to be able to vote for one entry once a week. Turns out this was more novel than it sounds.

Django has a few modules for rating or voting on content, one of which we’re using for the nomination and comments systems. The inner-workings of the module boil down to the following rules:

  1. A user must be logged in to rate/vote
  2. A user can rate/vote for any number of items
  3. A user can only rate/vote for any particular item once (though they may change their rating/vote later)

Compare this with the rules we wanted to enforce for the contest:

  1. A user must be logged in to vote
  2. A user can only vote once per 7 day period
  3. A user can vote for an item multiple times, so long as rule 2 is preserved

Aside from the first rule, we were trying to do almost exactly the opposite of what our rating module enforced. Rather than retrofit the existing module to allow additional and sometimes contradictory behaviour, we decided to write a very small voting module of our own.

The code revolves around two decision points: is voting allowed and can a specific user vote now. The first question is answered by the contest object itself. A contest knows when it’s starting and ending date are, so if today is after the start date and before the end date, then voting is allowed.

The second question is a bit more complicated, but not by much. Because of rule 2 above, we need to know when a user last voted to know if they’re currently allowed to vote. The database storage for a vote contains a datetime object, a foreign key to the user object and a foreign key to the contest entry so if we sort a user’s votes by time we can retrieve their latest vote.

def user_can_vote(self, user):
    increment = datetime.timedelta(days=7)
    votes = user.vote_set.order_by('-timestamp') #latest on top
    if votes:
        next_date = votes[0].timestamp + increment
        if datetime.datetime.today() < next_date and dt.today() < contest.end_date:
            return False
    return True

The above code gets a user’s votes and orders them by time with the most recent first. If a user has ever voted, we need to check if they’re allowed to vote again yet or if they have to wait. We calculate the earliest time that a user can vote next and check it against the date and time now. We also check the end of the contest against the date and time now. If “now” is before the next time the user can vote or “now” is after the contest’s end date, we return false; the user can’t vote now. If a user has never voted before, or the dates are all ok then the user can vote. This check is done after a user clicks the “vote” button but before a vote is saved to the database. We also display a message saying why this check failed and when a user will be able to vote again.

So we’re taking advantage of all of the spam protection built into Django’s user registration process and running a contest on surprisingly little code: 3 database tables, 200 lines of python (blank lines included) and a few templates is all we needed!

Pending edit system using Django

A common concern when we talk to people about OpenTreeMap is how much to trust the public with an organisation’s tree inventory. Every implementation of this open source system has a different answer. The original site, UrbanForestMap.org, allows a logged-in user to edit almost every bit of information they gather about a tree. PhillyTreeMap.org requires a certain level of reputation before a user can edit everything, but even a new user has considerable edit capabilities. The most recent implementation (still a work-in-progress) introduces a bit of oversight to public edits. The managing group wanted to double-check changes to officially inventoried trees, but didn’t want to get in the way of people adding and editing their own trees.

Lets look at how this changes the user story first:

A logged-in user makes an edit to a tree. The system needs to decide if these changes are applied to the tree or placed in a pending queue. If this is a publicly-entered tree, the changes are applied to the tree. (Start new requirements) If this is an inventory tree, and the user isn’t a member of a management group, add the change to the pending queue. Display any pending changes reasonably near the appropriate current value. (End new requirements)

Most of this happens behind the scenes in the saving logic. I added a bit of code to the top of our tree updater to check if the pending system is active, the user’s permissions and the tree’s origins. If everything checks out, the change goes straight into the updater code. For changes that go into the pending queue, the path to becoming an official change is a little more tortuous.

Since we’re storing these changes for later review, they have to go into the database. I created a new table to hold onto the original tree’s id, the field being changed and the new value as well as the user who submitted it, a date/time stamp and a status field. Each pending change is stored separately; even if the user makes more than one change to the tree, each ‘pend’ can be applied individually.

The rest of the pending system is eye-candy and a bit of slightly tedious templating. Almost every field on a tree’s detail page now needs to check two new things: are there any pending changes for this field, and does this user have permission to approve/disapprove pending edits. If there are pending edits, the new values are added below the current official value. When a managing user views the page, small approve and disapprove buttons also appear next to each pending change. Throw in a management-access-only page for some bulk evaluation and the system is complete!

Using the CQL_FILTER parameter with OpenLayers WMS layers

I’ve used Openlayer’s Marker layer in several projects and have always just accepted that I can’t display more than around 500 markers at a time for a given query. Recently, I found another way. We’re using GeoServer as a WMS tile server for the tree and municipal boundary layers in PhillyTreeMap.org. GeoServer’s WMS implementation allows an additional parameter in the url called CQL_FILTER. This parameter allows you to use a little language called Common Query Language, or CQL, to apply data filters to the tiles that GeoServer generates. CQL is a plain text, human readable query language created by the OGC, but I like to think of it as an extremely limited third-cousin-by-adoption of SQL. I haven’t found too much in the way of documentation on this obscure little gem, so here’s a rundown of how we used it to display search results in PhillyTreeMap.org.

If you look through the CQL and ECQL page in GeoServer’s documentation, there are several examples but they don’t cover everything you can do with CQL. Basically, a CQL filter is a list of phrases similar to where clauses in SQL, separated by the combiner words AND or OR. You can use the following operators in a CQL phrase:

  • Comparison operations: =, <, >, and combinations
  • Math expressions: +, -, *, /
  • NOT
  • IS, EXISTS
  • BETWEEN, BEFORE, AFTER
  • IN
  • LIKE
  • Geometric operators: CONTAINS, BBOX, DWITHIN, CROSS(ES), INTERSECT(S)

Some of these operations have examples in the GeoServer documentation, and others can be inferred from the GeoTools documentation (the stuff in their CQL.toFilter() calls). CQL can also call any of GeoServer’s filter functions.You can add parenthesis as needed to affect the order the filters are evaluated in, just like in SQL.

CQL has a lot of power for such a short spec, but it has a one very large deficiency that requires some database designing to avoid: the utter lack of join support. This makes sense when you consider that GeoServer doesn’t know about joins either. Ultimately, you’re using CQL against the GeoServer layer, not the underlying database structure. Building views or adding reference columns to the table GeoServer is accessing can help get around this.

In PhillyTreeMap.org, we use 4 types of CQL filters: id lookups using =, null checks using IS, date and integer ranges using BETWEEN and text searches using LIKE. Here are examples of those uses along with some array joining to get a valid CQL filter string at the end:

filter_list = []
filter_list.append("species_id = 212")
filter_list.append("height IS NULL")
filter_list.append("dbh BETWEEN 10 and 20")
filter_list.append("neighborhood_id_list LIKE '%42%' ")
cql = ' AND '.join(filter_list)
# should look like this: "species_id = 212 AND height IS NULL AND dbh BETWEEN 10 and 20 AND neighborhood_id_list LIKE '%42%' "

The above CQL filter would locate trees in a specific species, have no height value, only have dbh values between 10 and 20 inches and are in a particular neighborhood. The neighborhood_id_list filter would have been a join if this were written in standard SQL since neighborhoods and trees have a many-to-many relationship in our database. Since we can’t do joins, any time a tree is added or the location is updated, all of it’s related geographies’ ids are added to a reference column on the tree, and used specifically for this type of query.

CQL is passed to GeoServer in the same way as any other WMS variable. We’re using openlayers, so most of the WMS configuration variables are already set when we create the layer. The WMS layer has a little method called mergeNewParams that lets us change those parameters after the layer has been initialized. It also automatically redraws the layer, so the changes take place immediately. To add CQL to the WMS call, just add the CQL_FILTER variable to the layer’s parameters and the layer should update.

wms_layer.mergeNewParams({'CQL_FILTER': "species_id = 212 AND height IS NULL AND dbh BETWEEN 10 and 20 AND neighborhood_id_list LIKE '%42%' "});

You can remove any filters by deleting the parameter from the layer as if it was a normal javascript object. You’ll need to redraw the layer yourself before the change will be visible.

delete wms_layer.params.CQL_FILTER;
wms_layer.redraw();

Using django_sorting without text anchors

In creating the OpenDataPhilly website, we knew we needed to pay extra attention to the usability and features available on the search page. After all, what use is a data catalog if you can’t rely on the search results? While designing the page, we decided we wanted to include several ways to sort and filter results along with the standard text search. I’d used directeur’s django_sorting module before, and was highly impressed at how well it integrated with other info already in search parameters, so I decided to use it again. The only thing keeping me from seamlessly dropping in this module was our desire to use images instead of words for the “click on this to sort” link. Django_sorting was built with table headers in mind; so much so that the examples are all about table header tags with a link inside them. We had a slightly different implementation in mind.

The first hurdle that you might think of would be to not require a table structure. Thankfully, django_sorting doesn’t care how you display the data, it only cares about the fields you want to sort on. You can put the sorting links anywhere and it just works. So far, so good.

The second possible hurdle here is that the anchor template tag specification calls for two parameters: the field to sort on, and a string for the link. Since we didn’t want a text link, we really didn’t care about the second parameter. To my surprise, neither did django_sorting! So our anchor template tags look something like this:

<li id="sort_rating_score">{% anchor rating_score %}</li>

This winds up creating a link that still has something for the title attribute and for the inner text:

<li ...><a title="Rating_score" href="/blah/blah/?sort=rating_score">Rating_score</a></li>

So our template tag is nice and clean, but we still have to deal with some text in the link. Using either straight-up javascript, or some smaller and nicer jQuery, removing the innerHTML is easy so long as the dom can uniquely identify our sort links. I just gave the link’s parent container an id and cleared the innerHTML of each link. At the same time, I added a class and some css to define the image and size.

So now we have a django_sorting anchor with an image instead of the default text link. All done, right? Not quite. We didn’t just want to use an image here, we wanted some mouseover and focus sprite action, too. Another chuck of jQuery, and a querystring plugin, helps us out again:

$("#sort .sort_image").each(function () {
    $(this).hover(function() {
        this.style.backgroundPosition="0 -89px"; //the hover image location
    }, function () {
        var filter_split = this.parentNode.id.split('sort_');
        if ($.query.get('sort') && $.query.get('sort') == filter_split[1]) {
            this.style.backgroundPosition="0 -45px";  //the active image location
        } else {
            this.style.backgroundPosition="0 0"; //the non-active image location
        }
    });
 });