Building Districts in Web-Time

DistrictBuilder logoMost recently, the Politics, Redistricting and Elections team has been working closely with the Public Mapping Project to build DistrictBuilder, an open source, web-based application that enables regular citizens to use powerful tools to draw their own legislative districts. If you’ve seen how badly the professionals can mangle districts (Exhibit A, Exhibit B, etc), it’s easy to imagine that any given citizen, given the right tools, could do it better.

We spent quite a bit of time making the application easy to use and responsive in modern desktop web browsers.  The “easy to use” part was tackled by our excellent UI/UX design team. The “responsive” part was the domain of  our engineers.  That’s where the fun began for me.

DistrictBuilder is designed to use any polygon shapefile, transform it into an internal data model, then make that accessible via map tiles and geometric features.  When serving map tiles, we use GeoServer and GeoWebCache to generate the tiles and cache them, respectfully. This performance is great — pre-generated map tiles are the best we can aim for with respect to the base map tiles. Serving geometric features at full resolution, however, introduces a slew of problems. A few that stood out right away:

  • Web Browser Limitations — 9 out of 10 experts agree: too many map features has a significant performance impact on web browsers, with the greatest impact on the Microsoft Internet Explorer browser.
  • Excessive Coordinates — delivering lots of polygon coordinate pairs that the user may never see consumes valuable bandwidth and rendering time.
  • Server Processing Time — recalculating state-wide geometric features consumes valuable CPU time.

Web Browser Limitations

First, we tackled the browser performance issues. A sluggish browser is the kiss of death in the web world, and we had to make the application experience as fast as possible before looking at the server processing time.

We originally gave users the power to create highly detailed districts at the statewide level, but realized that no modern web browser could handle the volume of polygon features that would need to be served to represent an entire state.  In order to mitigate this limitation, we limited the size and number of features sent to the browser. With some scale-dependent logic, a user zoomed in to a detail of a district can finely tune the boundary by moving smaller geographic features (e.g. census blocks), and a user zoomed out to the state-wide level can manipulate the districts by moving large geographic features only (e.g. counties). In addition, when editing the finest details, we limit the number of features a user can move in a single edit.

Excessive Coordinates

The next thing to go was the set of full resolution geometries. In DistrictBuilder, users never actually see the full geometries, but an adaptively simplified (sometimes called generalized) geometry; depending on the scale of the map view, the server will deliver geometries with appropriate coordinate resolutions. Simply put: as you zoom in on the map, you get more detail in the geometries.

By simplifying counties, the geometries are reduced from 166,958 points to 4,821. When a user is zoomed out, there is no noticeable difference between these geometries!  However, as the user is interacting with higher resolution maps, DistrictBuilder loads in higher-resolution geometries on demand. The following images demonstrate the difference in the geometry detail:

Low Resolution Transition

The zoomed in County layer, with a low resolution district overlay (orange line). There are currently 1,414 coordinates in this view of the district overlay.

High Resolution Transition

The zoomed in VTD layer, with a high resolution district overlay (orange line). There are currently 3,253 coordinates in this view of the district overlay.

You can notice the differences in the district detail if you look closely at the orange district boundary. This transition happens seamlessly in the application, loading in the higher resolution geometries as web users zoom in to areas of interest.

We also eliminated coordinates that you never see.  It made no sense to serve  coordinates that were located in the opposite side of the state where a user was editing, just like you wouldn’t expect to get an encyclopedia in the mail when releasing an RFI. With the OpenLayers library, Strategies came in handy here, particularly BBOX.

Server Processing Time

After we had optimized the performance of the user interface, we shifted our focus to the server-side processing.  One of the features that makes DistrictBuilder such a powerful tool is the accuracy of the underlying data and constant feedback of important district statistics. In order to calculate all these statistics on the fly, it is necessary to leverage some tricks already mentioned with respect to map tiles: caching and generalizing.

Computation of the district statistics must happen every time a district boundary is changed. A naive solution to this problem would be to aggregate the values within the boundary every time a change is made.  This approach results in horrible performance. Instead, we just determine what has changed — which areas were added, which areas were removed — and recompute the delta, or change, on the previous district value.

Another trick to optimizing performance is in the way we determine the changing boundaries.  I’ll describe the problem using the census geographies of counties, tracts, and blocks. The structure and detail of the underlying data yielded computationally expensive queries against the block geometries.  We came up with a method of searching for the geographies in a hierarchical fashion — searching the counties first, then continuing to the next smallest-scale geography only if there was any remaining geometry left in the query.  We did the same for the tracts, and took a shortcut at the block level to exclude the block geometries.  This increased server side performance considerably.

King William County

King William county is comprised of 22 Voter Tabulation Districts and 1,527 Census Blocks.

Consider the following scenario: a user wants to move King William County (highlighted in yellow) from District 1, which is over populated, to District 3, which is under populated. Changing the boundaries with all the blocks in King William County would require testing at least 4,000 blocks for spatial intersections, then aggregating 1,527 data values, and recomputing the spatial aggregate (union) of those 1,527 geometries. With our hierarchical approach, we can change the boundary of the district with the county boundary, and change the population totals by the county’s population. A few orders of magnitude fewer operations to perform, and much faster from the user’s perspective.

Lessons Learned

Throughout the DistrictBuilder development process, the same core performance challenge has arisen: the volume of data must be reduced. This applies to all aspects of the application:

  • Map Tiles: pre-render tiles to keep the number of rendered tiles to a minimum at runtime.
  • Map Features: deliver to the browser only as much information as you can see (perhaps even less).
  • Database Queries: do anything possible to ensure that geometric operations are performed on simplified geometries.
  • Aggregating Statistics: cache whatever you can, and only compute the difference from the last cache state.

The above steps reduced the sheer number of operations and volume of processing that both the server and browser need to complete when creating new districts. These are lessons that translate well to any “big data” problem, and are crucial in bringing sophisticated GIS operations to the web.

Pending edit system using Django

A common concern when we talk to people about OpenTreeMap is how much to trust the public with an organisation’s tree inventory. Every implementation of this open source system has a different answer. The original site, UrbanForestMap.org, allows a logged-in user to edit almost every bit of information they gather about a tree. PhillyTreeMap.org requires a certain level of reputation before a user can edit everything, but even a new user has considerable edit capabilities. The most recent implementation (still a work-in-progress) introduces a bit of oversight to public edits. The managing group wanted to double-check changes to officially inventoried trees, but didn’t want to get in the way of people adding and editing their own trees.

Lets look at how this changes the user story first:

A logged-in user makes an edit to a tree. The system needs to decide if these changes are applied to the tree or placed in a pending queue. If this is a publicly-entered tree, the changes are applied to the tree. (Start new requirements) If this is an inventory tree, and the user isn’t a member of a management group, add the change to the pending queue. Display any pending changes reasonably near the appropriate current value. (End new requirements)

Most of this happens behind the scenes in the saving logic. I added a bit of code to the top of our tree updater to check if the pending system is active, the user’s permissions and the tree’s origins. If everything checks out, the change goes straight into the updater code. For changes that go into the pending queue, the path to becoming an official change is a little more tortuous.

Since we’re storing these changes for later review, they have to go into the database. I created a new table to hold onto the original tree’s id, the field being changed and the new value as well as the user who submitted it, a date/time stamp and a status field. Each pending change is stored separately; even if the user makes more than one change to the tree, each ‘pend’ can be applied individually.

The rest of the pending system is eye-candy and a bit of slightly tedious templating. Almost every field on a tree’s detail page now needs to check two new things: are there any pending changes for this field, and does this user have permission to approve/disapprove pending edits. If there are pending edits, the new values are added below the current official value. When a managing user views the page, small approve and disapprove buttons also appear next to each pending change. Throw in a management-access-only page for some bulk evaluation and the system is complete!

Bring on the data focused basemaps, Esri

It’s great to read that Esri is working on features and basemaps to include within ArcGIS.com to support data visualization.     Sometimes the map isn’t the focus; sometimes the data is the focus.  A few weeks ago, Bern Szukalski wrote an article for the Esri Insider blog that spoke about Esri’s efforts to create new basemaps including basemaps for data visualization purposes.   I think this is a great move for Esri.   Last fall, I suggested such a muted basemap via the ideas.arcgis.com portal, so I was quite excited to hear it is in the works.

A post today, also by Bern, explained a new feature within ArcGIS.com to allow the user to mute the basemap by adjusting it’s display transparency.    The HunchLab team stumbled upon this idea a few months back and it’s been a great way to use the existing topographic basemap.    In our demo instance of HunchLab, we are using the ArcGIS.com Topographic tiles set to a transparency of 60%.   You can see what it looks like below.

Kudos to Esri — keep the basemap options coming!

Scala’s Numeric type class (Pt. 2)

In my last blog post I introduced the idea of type classes (and scala.math.Numeric in particular). I demonstrated basic usage but also alluded to some problems. In this blog post I will explain the problems, talk about specialization, and introduce a version of Numeric that solves many of the problems (com.azavea.math.Numeric). I will also present some benchmarking numbers.

Icky Syntax

The most noticeable problem when using Numeric (at least in 2.8) is the syntax. As Kevin Wright alluded to in a comment on the last post, you can include the Numeric type class in two ways: via a context bound on your type parameter (A) or as an implicit argument.

In my earlier examples I used the implicit argument because I thought it was easier to follow (especially for those who are new to implicits) and it bound the instance of the type class to a named variable (n). But you can also use the type bound syntax, which makes the function declaration a bit cleaner looking.

// this makes it obvious that n is an instance of Numeric[A]
// and gives us a reference to use.
def square[A](a:A)(implicit n:Numeric[A]) = {
  n.times(a, a)
}
 
// this definition is a bit cleaner looking.
// implicitly[] searches for an implicit Numeric[A] instance.
// unfortunately the body of the function is less clean.
def square[A:Numeric](a:A) = {
  implicitly[Numeric[A]].times(a, a)
}

Either way, you’re still writing things like n.times(a, a) rather than a * a. Fortunately, Scala 2.9 introduced some nice new implicit parameters that allow us to use infix operators:

import Numeric.Implicits._
def square[A:Numeric](a:A) = a * a

Confusing Conversions

The implicits providing infix operators are definitely a big improvement! But while they are nice for simple cases like the previous one they can end up confusing the issue a little bit:

def twice[A:Numeric](a:A) = a * 2
//error: could not find implicit value for parameter num:
// scala.math.Numeric[Any]

What’s going on here? The problem is that * is mapped to Numeric.times(a:A, b:A) but the second argument we provided isn’t an A, it’s an Int. In Java (and Scala) we are used to being able to mix numeric types like double and int without worrying: the result will be promoted to double (the type with the widest range). Unfortunately, as written Numeric isn’t able to do these kinds of things and instead types must be converted explicitly. Numeric does provide two special methods, zero and one which allow you to get those values for all its types.

// use Numeric.one to get the correct type
def once[A:Numeric](a:A):A = a * implicitly[Numeric[A]].one
 
// convert 2 to be the correct type
def twice[A:Numeric](a:A):A = a * implicitly[Numeric[A]].fromInt(2)
 
// convert a to be an Int
def thrice[A:Numeric](a:A):Int = a.toInt * 3

There’s also no good way to combine two different generic Numeric types, other than converting to a common concrete type (e.g. Double). You could imagine Numeric providing something like fromType[B] (as well as things like fromDouble) but currently it doesn’t.

def combine[A:Numeric, B:Numeric](a:A, b:B) = a + b
//error: could not find implicit value for parameter num:
// scala.math.Numeric[Any]
 
// this works
def combine[A:Numeric, B:Numeric](a:A, b:B) = a.toDouble + b.toDouble
 
// this would be nice (for some definition of nice)
// but it doesn't work
def combine[A:Numeric, B:Numeric](a:A, b:B):A = {
  a + implicitly[Numeric[A]].fromType[B](b)
}

Performance

The most disappointing thing (in my opinion) about using scala.math.Numeric is its performance. In some cases it performs adequately, but in most cases it is very slow: actual algorithms written with Numeric tend to be 6-30 times slower than their direct counterparts (depending on what exactly they do). Some rough observations are that the comparison operators (e.g. >) seem to be slower than the math operators (e.g. *), and performance does depend on which type you end up using: Numeric[Double] does relatively well (compared to the same code written for Double). Numeric[Int] does very badly compared to code written for Int.

Specialization to the Rescue!

Fortunately, Scala itself contains the tool to fix Numeric’s performance problems: specialization. Specialization was first described in the paper Compiling Generics Through User-Directed Type Specialization by Iulian Dragos and Martin Odersky and was introduced in Scala 2.8 via the @specialized annotation. Other articles have been written explaining specialization, but I will give another brief one here.

When writing generic functions in Scala (as well as Java), the compiler generates one implementation that supports all possible argument types (e.g. java.lang.Object). This fine for Java [1] since all the valid types you could use with your generic function are guaranteed to be subclasses of java.lang.Object. If you remember my first post, you’ll recall how we had to create boxed instances of java.lang.Integer to hold int values when using generics.

Unfortunately for Scala it means that using generic functions with value-types (e.g.Int) is much slower than functions implemented directly in terms of the value-type itself, since there is a single implementation which requires its arguments to be objects (e.g. boxed).

The @specialized annotation allows the Scala compiler to generate separate implementations of the generic function: one for all the reference types (inheriting fromAnyRef) and one for each of the value types we choose to specialize over (e.g. Char, Double, Int). So if we have a generic function foo[A](a:A): A the compiler will generate other functions on primitive types, for instance fooInt(a:Int): Int, fooDouble(a:Double): Double, etc. And when we write foo(33) the compiler will translate that code into fooInt(33).

Since Scala hides boxing/unboxing, specialization doesn’t make the code look nicer. But it makes a huge difference in performance: Dragos and Odersky report a 22-40x speed up in micro-benchmarks!

Specializing Numeric

Soon after specialization was added to Scala 2.8, people began discussing specializing Numeric. Searching Google I find names like Jason Zaugg, Iulian Dragos, and others talking about it. Andreas Flierl emailed the scala-language mailing list on January 3, 2011 with some early tests which got me interested in working on this. So I don’t want to claim any credit for the amazing work done on implementing @specialized or even the idea of specializing Numeric.

Starting with Andreas’ code I began working on building a specialized Numeric, with the goal of making all its operations as fast as direct implementations. There were a bunch of ways that the code needed to be restructured (which I may discuss in a later article) and I learned a ton about implicits, typeclasses, and profiling Scala code. The code is available at: https://github.com/azavea/numeric along with the tests I’m running. The results follow, but the short story is that with the exception of implicit infix operators all com.azavea.math.Numeric code runs just as fast as the direct implementations.

The Numbers

test direct (ms) new (ms) old (ms)
from-int-to-int 22.3 22.0 0.99x 169.1 7.60x
from-int-to-long 38.0 37.3 0.98x 244.5 6.43x
from-int-to-float 23.1 23.0 0.99x 202.5 8.76x
from-int-to-double 40.0 37.9 0.95x 271.4 6.78x
adder-int 12.6 13.5 1.07x 145.3 11.50x
adder-long 21.1 21.4 1.01x 153.8 7.28x
adder-float 27.5 27.8 1.01x 27.5 1.00x
adder-double 27.3 26.0 0.95x 27.5 1.01x
array-total-int 8.3 8.3 1.00x 246.4 29.86x
array-total-long 11.5 11.1 0.97x 310.8 27.02x
array-total-float 16.1 16.6 1.03x 217.1 13.47x
array-total-double 17.3 16.8 0.97x 216.8 12.57x
array-rescale-int 27.1 26.8 0.99x 357.8 13.19x
array-rescale-long 24.6 24.6 1.00x 540.6 21.95x
array-rescale-float 59.3 59.6 1.01x 366.9 6.19x
array-rescale-double 91.4 91.1 1.00x 394.3 4.31x
find-max-int 16.6 16.9 1.02x 166.0 9.98x
find-max-long 12.1 12.1 1.00x 199.6 16.46x
find-max-float 22.6 22.4 0.99x 187.8 8.30x
find-max-double 23.9 23.5 0.98x 195.6 8.19x
quicksort-int 155.9 820.1 5.26x 929.0 5.96x
quicksort-long 1105.0 1326.4 1.20x 1238.6 1.12x
quicksort-float 228.6 1493.0 6.53x 1364.5 5.97x
quicksort-double 251.0 1470.3 5.86x 1306.3 5.20x
array-allocator-int 68.4 73.3 1.07x 42.6 0.62x
array-allocator-long 101.9 102.4 1.00x 143.3 1.41x
array-allocator-float 87.4 82.5 0.94x 146.8 1.68x
array-allocator-double 82.9 83.0 1.00x 162.4 1.96x
insertion-sort-int 47.5 48.6 1.02x 1463.0 30.80x
insertion-sort-long 49.9 50.3 1.01x 1440.6 28.88x
insertion-sort-float 50.0 50.9 1.02x 1658.6 33.17x
insertion-sort-double 49.3 49.9 1.01x 1850.5 37.57x
merge-sort-int 275.6 293.0 1.06x 870.9 3.16x
merge-sort-long 288.3 215.0 0.75x 842.9 2.92x
merge-sort-float 186.8 206.5 1.11x 901.3 4.83x
merge-sort-double 197.6 218.8 1.11x 895.5 4.53x
infix-adder-int 20.5 57.1 2.79x 125.9 6.14x
infix-adder-long 20.8 62.3 3.00x 250.6 12.08x
infix-adder-float 20.9 33.9 1.62x 176.4 8.45x
infix-adder-double 28.8 32.9 1.14x 189.0 6.57x


Discussion

Profiling this was a little bit annoying–I had to write 4 direct implementations (one for each of the types I was interested in) plus two generic implementations (one using the old scala.math.Numeric and one using the new com.azavea.math.Numeric). I used scala.testing.Benchmark although Yuvi Masory recently suggested using https://github.com/sirthias/scala-benchmarking-template (which I still need to go check out).

Some things to note:

  1. Old Numeric does much better on floating-point numbers than on integral ones. Why?
  2. For some reason old Numeric is able to beat the direct implementation of the array-allocator test.
  3. Since scala.util.Sorting[T] uses scala.math.Ordering[T] (which isn’t specialized) it’s not possible to make Numeric any faster in this test.
  4. scala.util.Sorting[Long] is over 4x slower than Int, Float and Double in the direct implementation.
  5. Infix operators are still slow enough that I still have mixed feelings endorsing them.

One additional gotcha which I haven’t discussed at length is what writing your own specialized numeric code (currently) looks like. Unfortunately, there’s a lot of boiler-plate. While users of your library will just say doSomethingComplicated(Array(0, 1, 2), 18, 99) you will be stuck writing:

def doSomethingComplicated[@specialized A:Numeric:Manifest]
(ns:Array[A], x:A, y:A) = {
  // implementation here
}

And that’s only one type parameter; God forbid you want A and B to both be Numeric!

I don’t know of any easy way around this. You have to say @specialized if you want your code to be specialized. If you fail to do this anywhere in the call chain you will use generic implementations from that point down, losing any benefit to specialization. You need to specify Numeric because that’s the point, right? And when writing code that needs to use the type parameter A to allocate arrays (among other things) you have to provide a Manifest. This gets ugly really fast.

Given my ignorance of the Scala compiler, I can imagine various magical things that might fix this. For instance, I could imagine some kind of synthetic type bound that combines others, e.g. FastNumeric = @specialized Numeric:Manifest or something. Is this possible (or even a good idea)? I don’t know.

Changes

This section is a bit rushed–I may expand it into another post later, but I just wanted to get the information out there.

One of the biggest changes I made to Numeric was tearing out scala.math.Ordering–I didn’t want to try to specialize “everything at once” but scala.math.Numeric inherits all of its (incredibly slow) comparison methods from Ordering. Specializing Ordering would solve this problem.

I also created a specialized typeclass with the humorous name “Convertable” which applies to all the value types which represent numbers (Byte, Short, Int, Long, Float and Double). The great thing about this is that you can avoid doing boxing/unboxing during conversion between types. It also allows you to implement the fromType[T] method I discussed earlier.

I left out Short and Byte after Paul Phillips pointed out that neither classes operations behave as the direct implementations do. In Scala (as in Java) Byte + Byte -> Int, whereas in Numeric A + A -> A (so Byte + Byte -> Byte). This causes all kinds of bugs. Since Short and Byte (and Char) are converted to Int when you actually do things with them, I decided it was unlikely users wanted to be able to use them in generic numeric functions.

A (potentially) controversial change I made was not including scala.math.Integral and scala.math.Fractional, and unifying support for the division and mod operators. Coming from dynamically-typed languages, I expected to be able to do the following:

// this does not work with the built-in Numeric
import Numeric.Implicits._
def scale[A:Numeric](n:A, num:A, denom:A) = (n * num) / denom
 
// this *does* work
import scala.math.Integral.Implicits._
def scale[A:Integral](n:A, num:A, denom:A) = (n * num) / denom

I understand that for someone who’s excited about creating user-defined numeric types this split isn’t a big deal. But from my perspective I want a generic numeric type that abstracts across all the useful value-types in Scala. I’m not opposed to having Integral and Fractional, but I would rather that Numeric not be defined in terms of them.

Finally I did some restructuring to avoid using inner traits and classes, since I had heard that those don’t specialize well.

Conclusion

While I still think there’s some room to make Numeric better, at this point it’s possible to write fast generic Numeric code in Scala. Stepping back, I think it’s incredibly exciting that I can avoid code duplication *and* maintain the speed that I’m used to from direct implementations.

I’m not sure what the best path is to add this capability to scala.math.Numeric. I think the trait lacks some important features (more type conversions, division, etc) and some of the restructuring is currently necessary to get these numbers. Since binary compatibility is important moving forward for Scala and the Typesafe team it’s not entirely clear what the best plan is. Hopefully some discussion and work at Scalathon can clarify things a bit.

Please respond with any questions or comments. Do let me know if you find problems in the profiling, or try it yourself and get dramatically different numbers. I spent some time trying not to fall into obvious micro-benchmark pitfalls, but there’s still a lot I don’t know about the JVM. Thanks for reading!

Using the CQL_FILTER parameter with OpenLayers WMS layers

I’ve used Openlayer’s Marker layer in several projects and have always just accepted that I can’t display more than around 500 markers at a time for a given query. Recently, I found another way. We’re using GeoServer as a WMS tile server for the tree and municipal boundary layers in PhillyTreeMap.org. GeoServer’s WMS implementation allows an additional parameter in the url called CQL_FILTER. This parameter allows you to use a little language called Common Query Language, or CQL, to apply data filters to the tiles that GeoServer generates. CQL is a plain text, human readable query language created by the OGC, but I like to think of it as an extremely limited third-cousin-by-adoption of SQL. I haven’t found too much in the way of documentation on this obscure little gem, so here’s a rundown of how we used it to display search results in PhillyTreeMap.org.

If you look through the CQL and ECQL page in GeoServer’s documentation, there are several examples but they don’t cover everything you can do with CQL. Basically, a CQL filter is a list of phrases similar to where clauses in SQL, separated by the combiner words AND or OR. You can use the following operators in a CQL phrase:

  • Comparison operations: =, <, >, and combinations
  • Math expressions: +, -, *, /
  • NOT
  • IS, EXISTS
  • BETWEEN, BEFORE, AFTER
  • IN
  • LIKE
  • Geometric operators: CONTAINS, BBOX, DWITHIN, CROSS(ES), INTERSECT(S)

Some of these operations have examples in the GeoServer documentation, and others can be inferred from the GeoTools documentation (the stuff in their CQL.toFilter() calls). CQL can also call any of GeoServer’s filter functions.You can add parenthesis as needed to affect the order the filters are evaluated in, just like in SQL.

CQL has a lot of power for such a short spec, but it has a one very large deficiency that requires some database designing to avoid: the utter lack of join support. This makes sense when you consider that GeoServer doesn’t know about joins either. Ultimately, you’re using CQL against the GeoServer layer, not the underlying database structure. Building views or adding reference columns to the table GeoServer is accessing can help get around this.

In PhillyTreeMap.org, we use 4 types of CQL filters: id lookups using =, null checks using IS, date and integer ranges using BETWEEN and text searches using LIKE. Here are examples of those uses along with some array joining to get a valid CQL filter string at the end:

filter_list = []
filter_list.append("species_id = 212")
filter_list.append("height IS NULL")
filter_list.append("dbh BETWEEN 10 and 20")
filter_list.append("neighborhood_id_list LIKE '%42%' ")
cql = ' AND '.join(filter_list)
# should look like this: "species_id = 212 AND height IS NULL AND dbh BETWEEN 10 and 20 AND neighborhood_id_list LIKE '%42%' "

The above CQL filter would locate trees in a specific species, have no height value, only have dbh values between 10 and 20 inches and are in a particular neighborhood. The neighborhood_id_list filter would have been a join if this were written in standard SQL since neighborhoods and trees have a many-to-many relationship in our database. Since we can’t do joins, any time a tree is added or the location is updated, all of it’s related geographies’ ids are added to a reference column on the tree, and used specifically for this type of query.

CQL is passed to GeoServer in the same way as any other WMS variable. We’re using openlayers, so most of the WMS configuration variables are already set when we create the layer. The WMS layer has a little method called mergeNewParams that lets us change those parameters after the layer has been initialized. It also automatically redraws the layer, so the changes take place immediately. To add CQL to the WMS call, just add the CQL_FILTER variable to the layer’s parameters and the layer should update.

wms_layer.mergeNewParams({'CQL_FILTER': "species_id = 212 AND height IS NULL AND dbh BETWEEN 10 and 20 AND neighborhood_id_list LIKE '%42%' "});

You can remove any filters by deleting the parameter from the layer as if it was a normal javascript object. You’ll need to redraw the layer yourself before the change will be visible.

delete wms_layer.params.CQL_FILTER;
wms_layer.redraw();

Scala’s Numeric type class (Pt. 1)

So I’ve been spending my R&D time recently working on improving Scala’s Numeric type class. We’re about to release an improved version of Numeric, and there’s also an incubator project aimed at getting it integrated into the Scala project. To explain the project (and in preparation for Scalathon in July) I have embarked on writing some blog posts to explain what Numeric is, what it does, and how it can be made better.

These posts will assume you know a bit of Scala, but I will try to keep the discussion as general as I can.

The Goal

The goal seems simple: to implement an algorithm but without choosing in advance which numeric type (e.g. int) to use, and without paying a performance penalty over a “direct” implementation.

Most dynamically-typed languages get this “for free” (in the sense that they pay a performance penalty regardless) and some statically-typed languages can accomplish this with macros, but I will focus on another approach: type classes. I will explain how Scala is able to transcend its Java roots and accomplish this task using the Numeric type class.

Java background

If you work with Java, you probably never write functions which can operate on both integral and floating point values.

Or at least, you probably never support both with one function–in that case you’d have to use java.lang.Number, which is implemented by Java’s boxed primitive types (java.lang.Integer, java.lang.Double) and which only gives you accessors for the various primitive types. It’s a mess:

public class Foo {
    public static Number add(Number t, Number u) {
        return new Double(t.doubleValue() + u.doubleValue());
    }
 
    public static void main(String[] args) {
        Number a = Foo.add(new Integer(3), new Integer(4));
        System.out.println(a.intValue());
    }
}

This code won’t work with some long values, and it basically forces the implementation to choose a primitive type that it hopes will work with the values it will be given (or to use java.math.BigDecimal for everything and incur even more slowness).

You might just support double and call it a day. Or you might write one version to deal with int and then copy-paste the same code but change all the int types to double, etc.. No matter what, the whole exercise feels icky.

Scala background

Scala tries to cover up some of Java warts by combining the notion of primitive types (int) and boxed types (java.lang.Integer) into a single unified type (Int). Scala’s Int type uses int when possible, but will transform them into java.lang.Integer instances behind the scenes if necessary. This makes boxing less annoying to do, but doesn’t make it any faster.

It also doesn’t directly help us accomplish our goal. Int and Double don’t share an interface which exposes things like + so we can’t usefully abstract across their types. In fact, they don’t share any superclass short of AnyVal so we can’t even implement something like our Java solution. (This isn’t totally correct–we can use Java’s types from Scala–but you get the idea.) So what gives? Is Scala even worse than Java when it comes to generic numbers?

No. Scala has more tools at its disposal than Java, and we can get around this problem using one of them: type classes.

Type classes in Scala

In Java, if you want a class to satisfy an interface, the class must be written with knowledge of the interface. When you create new interfaces, you have to go back and modify any classes which should implement them. If you’re using someone else’s classes and they don’t know about your interface, too bad!

This makes using other people’s code harder, and also locks you into someone else’s (possibly poorly-thought-out) type hierarchies. While I am not claiming that the type hierarchy for Scala’s numeric types is poorly-thought-out, it is certainly inconvenient for what we want to accomplish.

Type classes (introduced in the Haskell programming language) offer an alternative approach. Type classes are not interfaces which classes must implement. Instead a type class requires an instance to “glue” each member to the type class. When I say “glue” the member classes to the type class, I mean that each instance will provide the methods that the type class requires in terms of a particular member class. For example, if I have a Flammable type class and a House class, I can create a HouseIsFlammable instance that specifies how the House class satisfies Flammable.

We can implement type classes in Scala using implicit parameters. The basic idea is that we create a trait (Flammable[T]), along with implicit objects which “glue” the supported classes to new our type class (e.g. WoodIsFlammable). Each object corresponds to a single supported class (e.g. Wood). It implements the trait in terms of that class (e.g. Flammable[Wood]), and implements the trait’s methods (e.g. burn) in terms of the supported class.

Then when we want to use the type class, we make sure the instances are in scope and Scala’s compiler will automatically pick up those instances for us.

case class Alcohol(liters:Double)
case class Water(liters:Double)
 
case class Fire(heat:Double)
trait Flammable[A] {
  def burn(fuel:A): Fire
}
 
implicit object AlcoholIsFlammable extends Flammable[Alcohol] {
  def burn(fuel:Alcohol) = Fire(120.0)
}
 
def setFire[T](fuel:T)(implicit f:Flammable[T]) = f.burn(fuel)
 
setFire(Alcohol(1.0)) // ok
setFire(Water(1.0)) // FAIL

This is great! Notice that our Alcohol class doesn’t have to know that it is flammable; we just need to make sure that AlcoholIsFlammable is in scope whenever we want to pass Alcohol to setFire().

This is the exact approach that Scala takes: it defines a Numeric type class along with implicit objects that glue various types (like Int) to the type class.

For instance in Scala 2.8 (and later) this function will square the given number:

def square[T](a:T)(implicit n:Numeric[T]) = n.times(a, a)

Notice that we call the methods we’re interested in (e.g. times()) via the instance of our type class (n) and not the actual values that belong to the typeclass (a). Also, we don’t use a traditional type bound on T, but rather, requiring that an implicit object of type Numeric[T] can be found.

Numeric is part of the Standard Scala Library but unfortunately it does not seem to be widely used. I believe there are several reasons for this:

  1. You can’t do this stuff in Java so people are used to working around the problem.
  2. Using type classes in Scala can be a little bit confusing and cumbersome.
  3. Scala’s Numeric lacks important features (e.g. division).
  4. Scala’s Numeric is slow.

In my next blog post, I will explain how to use the Numeric type class in more detail, point out the problems I’m alluding to, and present our improved version along with some benchmarking numbers.

Using django_sorting without text anchors

In creating the OpenDataPhilly website, we knew we needed to pay extra attention to the usability and features available on the search page. After all, what use is a data catalog if you can’t rely on the search results? While designing the page, we decided we wanted to include several ways to sort and filter results along with the standard text search. I’d used directeur’s django_sorting module before, and was highly impressed at how well it integrated with other info already in search parameters, so I decided to use it again. The only thing keeping me from seamlessly dropping in this module was our desire to use images instead of words for the “click on this to sort” link. Django_sorting was built with table headers in mind; so much so that the examples are all about table header tags with a link inside them. We had a slightly different implementation in mind.

The first hurdle that you might think of would be to not require a table structure. Thankfully, django_sorting doesn’t care how you display the data, it only cares about the fields you want to sort on. You can put the sorting links anywhere and it just works. So far, so good.

The second possible hurdle here is that the anchor template tag specification calls for two parameters: the field to sort on, and a string for the link. Since we didn’t want a text link, we really didn’t care about the second parameter. To my surprise, neither did django_sorting! So our anchor template tags look something like this:

<li id="sort_rating_score">{% anchor rating_score %}</li>

This winds up creating a link that still has something for the title attribute and for the inner text:

<li ...><a title="Rating_score" href="/blah/blah/?sort=rating_score">Rating_score</a></li>

So our template tag is nice and clean, but we still have to deal with some text in the link. Using either straight-up javascript, or some smaller and nicer jQuery, removing the innerHTML is easy so long as the dom can uniquely identify our sort links. I just gave the link’s parent container an id and cleared the innerHTML of each link. At the same time, I added a class and some css to define the image and size.

So now we have a django_sorting anchor with an image instead of the default text link. All done, right? Not quite. We didn’t just want to use an image here, we wanted some mouseover and focus sprite action, too. Another chuck of jQuery, and a querystring plugin, helps us out again:

$("#sort .sort_image").each(function () {
    $(this).hover(function() {
        this.style.backgroundPosition="0 -89px"; //the hover image location
    }, function () {
        var filter_split = this.parentNode.id.split('sort_');
        if ($.query.get('sort') && $.query.get('sort') == filter_split[1]) {
            this.style.backgroundPosition="0 -45px";  //the active image location
        } else {
            this.style.backgroundPosition="0 0"; //the non-active image location
        }
    });
 });