Tag Archives: GeoServer

Geoserver Timestamp Styling and PostgreSQL DateTime Fields

Think for a moment about how many ways we talk about dates and time. In regular language, we can say things like “last Tuesday” or “week after the week after next” or “Friday the 12th” and people generally can figure out when we’re talking about. Add time into the mix and phrases start getting strange: “noon-thirty” and “half-past midnight” are my favorite oddities.

The actual date-time standard is published as ISO 8601 and talks about what date-time data should look like, but programming languages, databases and other programs implement this standard with sometimes vastly different interfaces and rules.

PostgreSQL’s handling of date-time data takes the form of 6 different field types, all suited to different data needs. For the OpenTreeMap project, we’re using the timestamp data type which allows us to store both dates and times for events down to the microsecond.

Geoserver accepts a single date-time type (also called timestamp) that stores dates and times at the same resolution as PostgreSQL’s data type, so it reads PostgreSQL’s date-time data easily. Adding a date filter to Geoserver’s SLD files is also fairly easy so long as you know how Geoserver wants date-time data to be formatted. The various formats that Geoserver knows how to interpret are located here. So, if we have a field in a database table called “last_updated”, then an example SLD filter might look like this:

<Filter>
  <PropertyIsLessThan>
     <PropertyName>last_updated</PropertyName>
     <Literal>2012-01-01 00:00:00</Literal>
  </PropertyIsLessThan>
</Filter>

This filter would display any updates made before midnight on January 01, 2012. Another example (from the OGC 1.0 encoding specification’s examples) catches updates between certain dates:

<ogc:Filter>
  <ogc:PropertyIsBetween>
    <ogc:PropertyName>last_updated</ogc:PropertyName>
    <ogc:LowerBoundary>
      <ogc:Literal>2011-12-01 00:00:00</ogc:Literal>
    </ogc:LowerBoundary>
    <ogc:UpperBoundary>
      <ogc:Literal>2011-12-31 23:59:59</ogc:Literal>
    </ogc:UpperBoundary>
  </ogc:PropertyIsBetween>
</ogc:Filter>

This filter would display any updates made during the month of December, 2011. Anything that was updated outside the specified date and time parameters would bypass these filters and go on to check any other rules in the SLD file. This works great if we’re only interested in static date comparisons. But what if we want to see updates less than a week old? Or objects that haven’t been updated for more than three months? This kind of dynamic filtering is a little harder to do.

After digging through the Geoserver documentation and a lot of googling, we decided to shift the dynamic part of the filter into a PostgreSQL view. Our test view included an id field, a geometry field and a field that calculates the number of days since the last update to that object. The view sql looks like this:

CREATE OR REPLACE VIEW timestamp_test AS
  SELECT
    treemap_tree.id,
    date_part('days'::text, now() - treemap_tree.last_updated) AS days,
    treemap_tree.geometry
  FROM treemap_tree;

So now we can add this view to Geoserver as a source layer and it will see the new days field as a static number field. We can use any of the property filters on this field and style recently added data in a dynamic fashion.

Putting the Fun in FOSS

I went to the State of the Map (SotM) and Free and Open Source Software for Geospatial (FOSS4G) Conference in Denver, CO last week, where I was surrounded by geospatial users, developers, and architects. I had the opportunity to attend some workshops and learn about a slew of awesome projects — I’m itching to start incorporating many of these new tools and techniques into our solutions.

Node.js

I was able to attend some of the workshops — “You’ve got Javascript in your backend” with Node.js and Polymaps was a great beginner workshop, introducing lightweight servers and client mapping libraries. I was amazed that a basic web server in node.js is only 5 lines of code. Equally amazing was seeing what capabilities Polymaps had when it weighted in at only 32K (minified) vs. OpenLayers at 1.2M (minified default build).

i2maps + pico

Some exciting visualization tools are coming out of the National Center for Geocomputation at the National University of Ireland, in the form of i2maps. While it’s relatively immature (not much in the form of documentation), most the basic functionality builds off of OpenLayers.  Since I’ve already learned the OpenLayers library, I has a short learning curve, and was able to get up to speed pretty quickly.  Their library incorporates some awesome features like dynamically loading and evaluating rasters via canvas (this only works on modern browsers), and even agent-based modeling. I could have stayed in that workshop for a week.

A byproduct of the i2maps project is pico. Pico is a bridge between Python and Javascript, enabling you to call native Python methods directly from Javascript. It performs all the plumbing for you, allowing you to write a simple callback to handle your method’s return value. It also takes care of converting Python objects into Javascript objects, allowing you to pass all sort of data back and forth (including rasters!).

mod-geocache

Another new project from a contributor to the MapServer project is mod-geocache, a tile caching service as an Apache module. This skips a lot of overhead (no proxying, no interpreters, no CGI), and is very fast. In addition, the C implementation has excellent speed and performance. You can perform on the fly tile merging, quantization, and recompression. I’m excited about this module, and the promise of caching with an Apache server (looks like it has more features than mod_tile).

Geoserver

Geoserver‘s next release is also going to include some great features. The ones that really jumped out at me:

  • Time and elevation filters — e.g. storm tracking, where you can limit the features by a time field.
  • Styling SLDs in data units — e.g. “road is 5m wide”, and changes dynamically with scale. This greatly simplifies scale-dependent renderers.
  • Georeferencing of layers can be done in the admin interface.
  • Layers can be view definitions — you don’t have to roll your own views prior to creating the layer.
  • Virtual Services — partition the data layers by workspace.

These aren’t all the new features; take a look at the laundry list yourself, and prepare to be impressed.

Mapnik 2

I think the reason for calling it Mapnik2 is that it is literally twice as awesome as it was before. I learned about the new features in Mapnik2 in the lightning talks at SotM, and I think this was one of the few talks that made you feel like you were actually struck by lightning. I can’t remember half the slides in the talk, but the supported formats, reprojection, styling, and speed improvements left me with my head spinning.

Building Districts in Web-Time

DistrictBuilder logoMost recently, the Politics, Redistricting and Elections team has been working closely with the Public Mapping Project to build DistrictBuilder, an open source, web-based application that enables regular citizens to use powerful tools to draw their own legislative districts. If you’ve seen how badly the professionals can mangle districts (Exhibit A, Exhibit B, etc), it’s easy to imagine that any given citizen, given the right tools, could do it better.

We spent quite a bit of time making the application easy to use and responsive in modern desktop web browsers.  The “easy to use” part was tackled by our excellent UI/UX design team. The “responsive” part was the domain of  our engineers.  That’s where the fun began for me.

DistrictBuilder is designed to use any polygon shapefile, transform it into an internal data model, then make that accessible via map tiles and geometric features.  When serving map tiles, we use GeoServer and GeoWebCache to generate the tiles and cache them, respectfully. This performance is great — pre-generated map tiles are the best we can aim for with respect to the base map tiles. Serving geometric features at full resolution, however, introduces a slew of problems. A few that stood out right away:

  • Web Browser Limitations — 9 out of 10 experts agree: too many map features has a significant performance impact on web browsers, with the greatest impact on the Microsoft Internet Explorer browser.
  • Excessive Coordinates — delivering lots of polygon coordinate pairs that the user may never see consumes valuable bandwidth and rendering time.
  • Server Processing Time — recalculating state-wide geometric features consumes valuable CPU time.

Web Browser Limitations

First, we tackled the browser performance issues. A sluggish browser is the kiss of death in the web world, and we had to make the application experience as fast as possible before looking at the server processing time.

We originally gave users the power to create highly detailed districts at the statewide level, but realized that no modern web browser could handle the volume of polygon features that would need to be served to represent an entire state.  In order to mitigate this limitation, we limited the size and number of features sent to the browser. With some scale-dependent logic, a user zoomed in to a detail of a district can finely tune the boundary by moving smaller geographic features (e.g. census blocks), and a user zoomed out to the state-wide level can manipulate the districts by moving large geographic features only (e.g. counties). In addition, when editing the finest details, we limit the number of features a user can move in a single edit.

Excessive Coordinates

The next thing to go was the set of full resolution geometries. In DistrictBuilder, users never actually see the full geometries, but an adaptively simplified (sometimes called generalized) geometry; depending on the scale of the map view, the server will deliver geometries with appropriate coordinate resolutions. Simply put: as you zoom in on the map, you get more detail in the geometries.

By simplifying counties, the geometries are reduced from 166,958 points to 4,821. When a user is zoomed out, there is no noticeable difference between these geometries!  However, as the user is interacting with higher resolution maps, DistrictBuilder loads in higher-resolution geometries on demand. The following images demonstrate the difference in the geometry detail:

Low Resolution Transition

The zoomed in County layer, with a low resolution district overlay (orange line). There are currently 1,414 coordinates in this view of the district overlay.

High Resolution Transition

The zoomed in VTD layer, with a high resolution district overlay (orange line). There are currently 3,253 coordinates in this view of the district overlay.

You can notice the differences in the district detail if you look closely at the orange district boundary. This transition happens seamlessly in the application, loading in the higher resolution geometries as web users zoom in to areas of interest.

We also eliminated coordinates that you never see.  It made no sense to serve  coordinates that were located in the opposite side of the state where a user was editing, just like you wouldn’t expect to get an encyclopedia in the mail when releasing an RFI. With the OpenLayers library, Strategies came in handy here, particularly BBOX.

Server Processing Time

After we had optimized the performance of the user interface, we shifted our focus to the server-side processing.  One of the features that makes DistrictBuilder such a powerful tool is the accuracy of the underlying data and constant feedback of important district statistics. In order to calculate all these statistics on the fly, it is necessary to leverage some tricks already mentioned with respect to map tiles: caching and generalizing.

Computation of the district statistics must happen every time a district boundary is changed. A naive solution to this problem would be to aggregate the values within the boundary every time a change is made.  This approach results in horrible performance. Instead, we just determine what has changed — which areas were added, which areas were removed — and recompute the delta, or change, on the previous district value.

Another trick to optimizing performance is in the way we determine the changing boundaries.  I’ll describe the problem using the census geographies of counties, tracts, and blocks. The structure and detail of the underlying data yielded computationally expensive queries against the block geometries.  We came up with a method of searching for the geographies in a hierarchical fashion — searching the counties first, then continuing to the next smallest-scale geography only if there was any remaining geometry left in the query.  We did the same for the tracts, and took a shortcut at the block level to exclude the block geometries.  This increased server side performance considerably.

King William County

King William county is comprised of 22 Voter Tabulation Districts and 1,527 Census Blocks.

Consider the following scenario: a user wants to move King William County (highlighted in yellow) from District 1, which is over populated, to District 3, which is under populated. Changing the boundaries with all the blocks in King William County would require testing at least 4,000 blocks for spatial intersections, then aggregating 1,527 data values, and recomputing the spatial aggregate (union) of those 1,527 geometries. With our hierarchical approach, we can change the boundary of the district with the county boundary, and change the population totals by the county’s population. A few orders of magnitude fewer operations to perform, and much faster from the user’s perspective.

Lessons Learned

Throughout the DistrictBuilder development process, the same core performance challenge has arisen: the volume of data must be reduced. This applies to all aspects of the application:

  • Map Tiles: pre-render tiles to keep the number of rendered tiles to a minimum at runtime.
  • Map Features: deliver to the browser only as much information as you can see (perhaps even less).
  • Database Queries: do anything possible to ensure that geometric operations are performed on simplified geometries.
  • Aggregating Statistics: cache whatever you can, and only compute the difference from the last cache state.

The above steps reduced the sheer number of operations and volume of processing that both the server and browser need to complete when creating new districts. These are lessons that translate well to any “big data” problem, and are crucial in bringing sophisticated GIS operations to the web.

Using the CQL_FILTER parameter with OpenLayers WMS layers

I’ve used Openlayer’s Marker layer in several projects and have always just accepted that I can’t display more than around 500 markers at a time for a given query. Recently, I found another way. We’re using GeoServer as a WMS tile server for the tree and municipal boundary layers in PhillyTreeMap.org. GeoServer’s WMS implementation allows an additional parameter in the url called CQL_FILTER. This parameter allows you to use a little language called Common Query Language, or CQL, to apply data filters to the tiles that GeoServer generates. CQL is a plain text, human readable query language created by the OGC, but I like to think of it as an extremely limited third-cousin-by-adoption of SQL. I haven’t found too much in the way of documentation on this obscure little gem, so here’s a rundown of how we used it to display search results in PhillyTreeMap.org.

If you look through the CQL and ECQL page in GeoServer’s documentation, there are several examples but they don’t cover everything you can do with CQL. Basically, a CQL filter is a list of phrases similar to where clauses in SQL, separated by the combiner words AND or OR. You can use the following operators in a CQL phrase:

  • Comparison operations: =, <, >, and combinations
  • Math expressions: +, -, *, /
  • NOT
  • IS, EXISTS
  • BETWEEN, BEFORE, AFTER
  • IN
  • LIKE
  • Geometric operators: CONTAINS, BBOX, DWITHIN, CROSS(ES), INTERSECT(S)

Some of these operations have examples in the GeoServer documentation, and others can be inferred from the GeoTools documentation (the stuff in their CQL.toFilter() calls). CQL can also call any of GeoServer’s filter functions.You can add parenthesis as needed to affect the order the filters are evaluated in, just like in SQL.

CQL has a lot of power for such a short spec, but it has a one very large deficiency that requires some database designing to avoid: the utter lack of join support. This makes sense when you consider that GeoServer doesn’t know about joins either. Ultimately, you’re using CQL against the GeoServer layer, not the underlying database structure. Building views or adding reference columns to the table GeoServer is accessing can help get around this.

In PhillyTreeMap.org, we use 4 types of CQL filters: id lookups using =, null checks using IS, date and integer ranges using BETWEEN and text searches using LIKE. Here are examples of those uses along with some array joining to get a valid CQL filter string at the end:

filter_list = []
filter_list.append("species_id = 212")
filter_list.append("height IS NULL")
filter_list.append("dbh BETWEEN 10 and 20")
filter_list.append("neighborhood_id_list LIKE '%42%' ")
cql = ' AND '.join(filter_list)
# should look like this: "species_id = 212 AND height IS NULL AND dbh BETWEEN 10 and 20 AND neighborhood_id_list LIKE '%42%' "

The above CQL filter would locate trees in a specific species, have no height value, only have dbh values between 10 and 20 inches and are in a particular neighborhood. The neighborhood_id_list filter would have been a join if this were written in standard SQL since neighborhoods and trees have a many-to-many relationship in our database. Since we can’t do joins, any time a tree is added or the location is updated, all of it’s related geographies’ ids are added to a reference column on the tree, and used specifically for this type of query.

CQL is passed to GeoServer in the same way as any other WMS variable. We’re using openlayers, so most of the WMS configuration variables are already set when we create the layer. The WMS layer has a little method called mergeNewParams that lets us change those parameters after the layer has been initialized. It also automatically redraws the layer, so the changes take place immediately. To add CQL to the WMS call, just add the CQL_FILTER variable to the layer’s parameters and the layer should update.

wms_layer.mergeNewParams({'CQL_FILTER': "species_id = 212 AND height IS NULL AND dbh BETWEEN 10 and 20 AND neighborhood_id_list LIKE '%42%' "});

You can remove any filters by deleting the parameter from the layer as if it was a normal javascript object. You’ll need to redraw the layer yourself before the change will be visible.

delete wms_layer.params.CQL_FILTER;
wms_layer.redraw();

MapServer wins FOSS4G 2009 WMS Shootout

MapServer is back on top! The results of the WMS shootout at FOSS4G 2009 are out, and MapServer beats GeoServer in terms of speed.

Though it begs the question: who would win a WMS shootout: WMS, WMS, WMS, WMS, or WMS?

http://geoserver.org/display/GEOS/Welcome

Tricks & Woes w/ WMS GetFeatureInfo

Here’s a trick that could be useful if you’re writing a simple map application in JavaScript that interacts with a WMS map service that you control, and your application needs information about a feature that the user has clicked on.   Did you know that you can cajole your WMS server to directly produce JSON (which is a very convenient way to receive data in a JS application) with feature information as a response to a WMS request?  I found this trick useful in a recent application as a simple approach, if one doesn’t mind wrangling with server configuration.  Unfortunately, it involves using an under-specified part of the WMS standard, so this recipe is also a good illustration of where the OGC WMS standard could be improved.

In the WMS specification, OGC defined an “optional” GetFeatureInfo request that returns feature information based on a pixel from a map.  It’s both useful and limiting that the request is based on the pixel — it’s useful in that you can just pass along the pixel that the user clicked on, but you don’t have a lot of control (without other trickery I won’t go into here).  But here’s the catch: the output format isn’t defined in the standard.  But here the problem becomes the opportunity — most WMS servers allow you can define your own output format, which includes JSON.

But again the problem returns to the fact that there isn’t a standard in place here:  different mapservers handle the output format definition differently.  MapServer has a very straightforward method — it will respond to a “text/html” request with three templates you can define: you can define a header template, a content template (that is repeated for each feature), and a footer template.  It will replace any attribute name surrounded with brackets with the value of the attribute.  So if you create a header.html file with a single left bracket (‘[') and a footer.html with a single right bracket (']‘) and then a template.html file that is something like:

{
id: “[id]“,
attribute: “[attribute]“
}

you’ll get responses to your GetFeatureInfo request in proper JSON, e.g.:

“[ { id: "123", attribute: "foo" }, { id: "456", attribute: "bar" } ]“

GeoServer has a different structure for its GetFeatureInfo templates, which is described here but the same basic approach works, with the major difference being that you create a single template with header and footer sections.  I’m not sure how it’s done with ArcGIS Server, but with ArcIMS there are XSL files for each output format, and you could create one that generates JSON without too much trouble if you are familiar with XSL.  If someone figures out how to do this with ArcGIS Server, I’ve love to know.

Of course, what would really be fantastic would be if the OGC standard was extended to include a standard output format — I won’t even dare to dream for a standard way to define the output format.