In October, GIS and Cartographic Modeling by Dana Tomlin was re-released, and Azavea now has a few copies in the office. I was reading through one of the copies and got into a conversation with Josh Marcus, the lead engineer on our GeoTrellis team, which is working towards a new release (0.8) that incorporates many of the map algebra concepts detailed in the book. Over a series of blog posts, we will take a look at map algebra functions (that are currently or will be soon incorporated into GeoTrellis) by creating a site suitability model using wildlife habitat preferences.

Finding the Jackalope

For this model, I have selected recently available data of nationwide sitings of the Western Jackalope* and descriptions of their habitat preferences by a reputable wildlife biologist**. We will begin the analysis by defining a study area in a region of high jackalope population, then derive several habitat preference layers using map algebra and different geographic data sets that are publicly available.

Our first exercise will be running a kernel density estimation (KDE) on this point distribution to identify an optimal study area in which to focus. Some of the points represent multiple sitings and are coded as such in their attributes. This will affect the KDE, so it’s important to know. KDE is a statistical method for estimating probability density of a variable. When approached mathematically, it bears a close relationship to a histogram, but with an adjustable “kernel” bandwidth that produces a smooth curve. This bandwidth is effectively a radius when applied to points in two-dimensional space.

I am running these calculations in ArcGIS using the Spatial Analyst extension. When preparing the calculation, the software requests a population field (in this case, the number of sitings in a particular location) and a search radius. If the search radius is too small, the resulting density map will be too localized. If it’s too large, the result won’t provide enough detail. Since we’re dealing with a geographic extent of several Western states and a relatively low count of sitings, I found that a search radius of 100 kilometers provided the appropriate level of smoothing. Getting the right kernel size can sometimes be a trial and error process.

Kernel Tomlin’s Focal Brigade

While KDE is not specifically referenced in Tomlin’s book, it is an example of what he refers to as FocalDistribution. FocalDistribution is a type of neighborhood calculation that results in a “sum of contributions from all locations within its neighborhood, each of which contributes a portion of its value … to the locations around it such that those portions diminish with distance.” So the farther from a jackalope siting we go, the lower the value of that siting. Each of the siting locations have these bubbles of value, decreasing with distance. If we were to add them all together, it would look like the result of the KDE analysis.

In the Kernel Density tool included in the ArcGIS Spatial Analyst extension, the user is presented with a few optional parameters and two critical ones: a “search radius” and a “population field”. The “population field” parameter provides the ability to specify a population to be spread across the kernel in order to create a more continuous surface. It can be confusing because, while population is listed as “required” and search radius is listed as “optional”, a population field is not actually necessary, and a value of “NONE” is very common for things like crime and other locations that don’t include observations with a count. Search radius value actually is required, and Spatial Analyst simply uses a default value if you don’t specify one (according to the docs, the default is “the shortest of the width or height of the extent of input features in the output spatial reference, divided by 30”).

While I was working with points in this example, Esri’s Kernel Density tool also supports line density, which is pretty useful if you’d like to generate a “density of road network” or similar surface based on a polyline layer. One limitation of Esri’s implementation is that it assumes a “Gaussian” or normal distribution to the histogram that defines the kernel, but KDE operations in other implementations sometimes offer alternative distributions, such as quadratic or even custom kernels. Some implementations even support an “adaptive kernel” that changes in size based on local conditions. It is important to note that Tomlin’s conceptual framework for Focal operations incorporates a much broader set of options than is commonly implemented in GIS software, and FocalDistribution is no exception. The operation can be modified through the use of the following keywords:

The keywords “at DISTANCE by DIRECTION” supports generalization of the normally circular kernel neighborhoods to include alternative kernel shapes like doughnuts and wedges.
The “FocalDistribution radiating” operation generalizes even further to include non-linear neighborhoods that include line-of-site.
The “FocalDistribution spreading” adds support for travel-cost neighborhoods

The latter couple of options are particularly interesting as they suggest a type of non-Euclidean Kernel Density that could incorporate concept of “friction” through which the density is computed. In personal correspondence with Tomlin related to Azavea’s NSF-supported research into GPU-based raster computation, Tomlin writes:

To the best of my knowledge, there has never been a non-Euclidean version of Kernel Density: one that would redistribute point quantities in a manner that would account for the “friction” of an underlying terrain. Given a friction layer like [the left figure], for example (where darker shades of grey represent higher frictions), such a surface might look like [the middle figure] rather than [the right figure].


Images courtesy of C. Dana Tomlin.

Tomlin goes on to suggest that this type of operation would be particularly amenable to the distributed computation techniques we developed in our GPU research and are continuing to develop with GeoTrellis. That’s an exciting idea for our GeoTrellis team. The forthcoming 0.8 release does not include non-Euclidean cost-distance operations, but they are a strong contender for new features in 0.9.

Back to the Jackalope

The KDE operation demonstrates that the densest population of jackalopes in Eastern Wyoming, near the North Platte river west of Douglas. Now we can define a study area of manageable scale, can procure the data sets we will need to build our habitat model, and really get in to some map algebra.

In part two of this series, I will explore two focal map algebra functions that act on both an immediate vicinity to a given location and an extended vicinity, as well as a local map algebra function for layer reclassification.

Kernel Density in GeoTrellis

While my objective for this series is to highlight some of the Map Algebra features we are developing in our open source GeoTrellis framework, I did not show how this would work from a developer’s perspective. If you’re curious, you can find some docs for the Kernel Density and Convolution (a more more general term for raster transformations that use a neighborhood) operations at:

Convolve operation in GeoTrellis
Kernel Density operation in GeoTrellis

Perhaps one of the engineers on the GeoTrellis team will take this up in a future Azavea Labs article.

* This is a completely fictional and fabricated data set.

** The biologist is also made up.

Applying Map Algebra – Part 1

Applying Map Algebra – Part 1

Finding the Jackalope

Kernel Tomlin’s Focal Brigade

Back to the Jackalope

Kernel Density in GeoTrellis