Summer of Maps Student Applications are Open

Summer of Maps banner

 

Today we opened up the applications for student fellowships for 2013 Azavea Summer of Maps. Summer of Maps offers fellowships to student GIS analysts to perform geographic data analysis for non-profit organizations. We are going to match up non-profit organizations that have spatial analysis and data visualization needs with talented students of GIS analysis to implement projects over a three-month period during the summer. Here’s the remaining schedule for the selection process:

  • Mar 1 – Sun, Mar 17, 11pm – Students submit proposals and applications
  • Mid-March – Early April – Top candidates are interviewed in Philadelphia
  • Mon, Apr 15 – Successful Summer of Maps fellows will be announced
  • mid-May – August – Summer of Maps fellows work on spatial analysis projects

 

What’s in it for the non-profit orgs?

  • Receive pro bono services from a talented student GIS analyst to geographically analyze and visualize your data
  • Visualize your data in new ways
  • Combine your data with other demographic and geographic data to draw new observations
  • Receive high quality maps that can be used to make a case to funders or support new initiatives

What’s in it for the students?

  • Work on a spatial analysis project that supports the social mission of two non-profit organizations
  • Work with Azavea mentors to improve your GIS skills
  • Receive a monthly stipend
  • Gain professional work experience implementing a real-world GIS project

If you are a student age 18 or over, you can submit an application from now through March 17, 11pm ET.

 

Learn more about Azavea Summer of Maps at SummerOfMaps.org.

 

Fellowship Sponsors

We want to thank the following organizations for sponsoring a fellow:

Big round of applause to them. Their sponsorships will enable us to expand the program this year.

 

New FedGeo Day Event in DC

Azavea is proud to be sponsoring FedGeo Day, a new one-day event on Feb 28 that will focus on open source geospatial tools that are being used by federal government agencies. Much of the conference schedule has now been posted. It’s a single jam-packed day of case studies, panel discussions and technology showcases.

Azavea will be presenting on:

  • Distributed, real-time geoprocessing with GeoTrellis
  • Spatial analysis of water infrastructure with the Army Corps of Engineers

Check out the full schedule at fedgeoday.com.  You can register fedgeoday.eventbrite.com. For ongoing news you can follow @fed_geo on Twitter.

Applying Map Algebra – Part 2

The is the second in a series of articles on applying Map Algebra to solving problems. In the last installment, I discussed beginning to develop a habitat model as an example through which to apply map algebra. Now that there is a defined study area, it is time to dig in to the model itself.

HABITAT MODELING: A BRIEF INTRO

After years of study, our wildlife biologist has compiled not only a detailed data set of jackalope observations, but has written a habitat preference description complete with measures of importance for each aspect of the habitat.  We have all of our jackalope habitat preference data, described in narrative form, but how do we translate that into a cartographic model?

The text of the study identifies a number of geographic characteristics that make for attractive jackalope habitat. Taken alone, these features may not be enough to provide the proper environment, but where the features are in proximity to each other or overlap, the potential is higher for prime jackalope territory. To translate the conceptual model to a geographic one, we might follow these steps:

  1. Extract quantifiable qualities of the landscape and weights of importance for each from the text.
  2. Gather spatial data that is available for the study area and relates to the above qualities.
  3. “Ask quantitative questions” of the raw data to generate derivative spatial layers that represent the above qualities (using map algebra!).
  4. Combine the derived layers according to their weights of importance (weighted overlay) to create a cartographic model of the prime habitat.

The final result will be a graduated raster surface of habitat quality, with the highest scores representing prime habitat, middle scores representing marginal habitat, and low scores representing poor habitat.

BREAKING DOWN THE JACKALOPE MODEL

The relevant information of the habitat preferences description can be summarized as follows:

  • Near perennial sources of water, but not too near
  • Flat to slightly sloped land, not too steep
  • “Pockets” in rough terrain
  • Areas of no more than 150m of elevation change
  • Areas without a lot of human development
  • Near edges of agricultural land

We’ll get into each of these in detail as we develop the model. The data sets that will be used to derive individual habitat preference layers are the National Elevation Dataset (NED), the National Hydrography Dataset (NHD), and the National Land Cover Dataset (NLCD). These are all available for all locations in the U.S. (as you might expect from the names) and can be downloaded from the USGS.

NED elevation and NHD perennial streams

National Land Cover Dataset (NLCD)

WATERING HOLES

Generating the first habitat factor requires three steps:  identifying perennial water sources, calculating proximity, and reclassifying the proximity to prepare it for the weighted overlay. The NHD has attribute data that identifies hydrologic flow lines as perennial, intermittent, ephemeral, etc., so we can query out the perennial streams easily.

Calculating the distance to these lines requires a map algebra operation called FocalProximity.

Stepping back: Dana Tomlin, in his book GIS and Cartographic Modeling, organizes map algebra operations into three main categories based on the way they interact with geographic data.

  • Local – calculate a location’s value based on the values of that location in one or more layers
  • Focal – calculate a location’s value based on its relationship to the cells around it on the same layer
  • Zonal – calculate a location’s value based on where it falls in a second layer that defines zones of variable shape and size

We looked at a Focal operation in the last installment, FocalDistribution, which is closely related to a kernel density estimate and required several paragraphs of explanation. FocalProximity is one of the more elementary Focal operations. It essentially measures the distance from each raster cell to the specified spatial feature, and assigns the distance the cell. The ArcGIS Spatial Analyst extension refers to this as Euclidian Distance.

 

So we ran a FocalProximity on our perennial stream data, and the result ranges from distances of 0 meters to over 3,500 meters. Our distance to water habitat preference, “Near perennial sources of water, but not too near”, is more precisely defined as “between 100 and 1,000 meters from perennial water sources”. The next step is to identify those bands of distance by running a LocalReclassification operation.

“Reclassification” will organize the distance values into two categories: valid habitat and otherwise. We’ll represent these numerically as 1 and 0. For each raster cell, the operation will see if the value is between 100 and 1,000, and classify the cell appropriately. The result is our first derived habitat preference map – distance to water.

 

FLATLANDERS

Jackalopes tend to prefer flat terrain, which is the second habitat preference we will derive. The habitat preference details specify the slope cutoff as 8 degrees.

FocalGradient is the operation we will use to calculate slope against the NED. This Focal operation derives a slope for a given location based on that location’s relationship to its immediate vicinity. The elevations of the surrounding raster cells are used to calculate a best-fit, three-dimensional plane. It is also possible to determine slope direction during this calculation, but we are only interested in the degree of the slope in this case.

The ArcGIS Spatial Analyst extension offers two options for slope calculation output: degrees and percent. Degrees measure the angle of the incline, whereas percent measures the ratio of rise over run. For many engineering applications, percent would be the more logical choice, but a jackalope cares nothing for this distinction.

Another option that ArcGIS provides is a Z factor, which allows for a unit conversion between horizontal and vertical coordinates. For example, you may find elevation data projected to a State Plane Coordinate System in US Feet, although the vertical units are meters. In this situation, you would want to use 3.28084 as the Z factor. The NED data uses meters for both the horizontal and vertical unites, so the default Z factor of 1 is acceptable.

We will reclassify the results of our FocalGradient operation as we did the proximity to water layer. For our slope layer, anything 8 or lower will receive a score of 1, and anything greater than 8 will become a 0. The result is our second habitat preference layer – flat areas.

There are still more layers to derive before we are ready for a weighted overlay calculation. In the next installment, we’ll look at some Focal operations that calculate a location’s values based on the location’s relationship to its extended vicinity, sometimes referred to as neighborhood operations.

FOCALGRADIENT AND RECLASSIFICATION IN GEOTRELLIS

You can find the equivalent operations in GeoTrellis

  • FocalGradient – Calculate FocalGradient using the op.focal.Slope operation
  • LocalReclassify – The GeoTrellis team is working on a more general LocalReclassify operation for the 0.9 release, but in the current version (v 0.8), the If/Else operation can be used to set two integer values based on whether or not they satisfy a condition. This will work for the types of reclassifications performed above

2013 Summer of Maps is Open!!

Summer of Maps banner

 

It’s been frigidly cold here in Philadelphia this past week, but we are already thinking about the summer.  I am pleased to announce the 2013 Azavea Summer of Maps.  Summer of Maps offers fellowships to student GIS analysts to perform geographic data analysis for non-profit organizations.  We are going to match up non-profit organizations that have spatial analysis and visualization needs with talented students of GIS analysis to implement projects over a three-month period during the summer.  Here’s how it will work:

  • Jan 21 – Sun, Feb 10, 11pm – Non-profit organizations can submit brief proposals for spatial analysis projects to Azavea
  • Feb 11 – Feb 28, – Azavea program administrators review organizations
  • Mar 1 – Sun, Mar 17, 11pm - Students submit proposals and applications
  • Mid-March – Early April – Top candidates are interviewed in Philadelphia
  • Mon, Apr 15 - Successful Summer of Maps fellows will be announced
  • mid-May – August – Summer of Maps fellows work on spatial analysis projects

If you participated in the program last year, you’ll notice that we’ve moved the dates up a bit.  If we make additional changes,we’ll post them on the main web page.

What’s in it for the non-profit orgs?

  • Receive pro bono services from a talented student GIS analyst to geographically analyze and visualize your data
  • Visualize your data in new ways
  • Combine your data with other demographic and geographic data to draw new observations
  • Receive high quality maps that can be used to make a case to funders or support new initiatives

What’s in it for the students?

  • Work on a spatial analysis project that supports the social mission of a non-profit organization
  • Work with Azavea mentors to improve your GIS skills
  • Receive a monthly stipend
  • Gain work experience implementing a GIS project

If you are a non-profit organization and have a project you would like to see implemented, please submit an application.  Deadline is Sun Feb 10, 11pm EST.  If you are a student, stay tuned – applications will open March 1, 2013.

 

Learn more about Azavea Summer of Maps at www.azavea.com/summer-of-maps/.

 

Fellowship Sponsors

We want to thank the following organizations for sponsoring a fellow:

Big round of applause to them. Their sponsorships will enable us to expand the program this year.

 

Applying Map Algebra – Part 1

 

In October, GIS and Cartographic Modeling by Dana Tomlin was re-released, and Azavea now has a few copies in the office. I was reading through one of the copies and got into a conversation with Josh Marcus, the lead engineer on our GeoTrellis team, which is working towards a new release (0.8) that incorporates many of the map algebra concepts detailed in the book. Over a series of blog posts, we will take a look at map algebra functions (that are currently or will be soon incorporated into GeoTrellis) by creating a site suitability model using wildlife habitat preferences.

Finding the Jackalope

For this model, I have selected recently available data of nationwide sitings of the Western Jackalope* and descriptions of their habitat preferences by a reputable wildlife biologist**. We will begin the analysis by defining a study area in a region of high jackalope population, then derive several habitat preference layers using map algebra and different geographic data sets that are publicly available.

Point Distribution

Our first exercise will be running a kernel density estimation (KDE) on this point distribution to identify an optimal study area in which to focus. Some of the points represent multiple sitings and are coded as such in their attributes. This will affect the KDE, so it’s important to know. KDE is a statistical method for estimating probability density of a variable. When approached mathematically, it bears a close relationship to a histogram, but with an adjustable “kernel” bandwidth that produces a smooth curve. This bandwidth is effectively a radius when applied to points in two-dimensional space.

KDE Settings

I am running these calculations in ArcGIS using the Spatial Analyst extension. When preparing the calculation, the software requests a population field (in this case, the number of sitings in a particular location) and a search radius. If the search radius is too small, the resulting density map will be too localized. If it’s too large, the result won’t provide enough detail. Since we’re dealing with a geographic extent of several Western states and a relatively low count of sitings, I found that a search radius of 100 kilometers provided the appropriate level of smoothing. Getting the right kernel size can sometimes be a trial and error process.

Kernel Density Result

Kernel Tomlin’s Focal Brigade

While KDE is not specifically referenced in Tomlin’s book, it is an example of what he refers to as FocalDistribution. FocalDistribution is a type of neighborhood calculation that results in a “sum of contributions from all locations within its neighborhood, each of which contributes a portion of its value … to the locations around it such that those portions diminish with distance.”  So the farther from a jackalope siting we go, the lower the value of that siting. Each of the siting locations have these bubbles of value, decreasing with distance. If we were to add them all together, it would look like the result of the KDE analysis.

In the Kernel Density tool included in the ArcGIS Spatial Analyst extension, the user is presented with a few optional parameters and two critical ones: a “search radius” and a “population field”.  The “population field” parameter provides the ability to specify a population to be spread across the kernel in order to create a more continuous surface.  It can be confusing because, while population is listed as “required” and search radius is listed as “optional”, a population field is not actually necessary, and a value of “NONE” is very common for things like crime and other locations that don’t include observations with a count.  Search radius value actually is required, and Spatial Analyst simply uses a default value if you don’t specify one (according to the docs, the default is “the shortest of the width or height of the extent of input features in the output spatial reference, divided by 30″).

While I was working with points in this example, Esri’s Kernel Density tool also supports line density, which is pretty useful if you’d like to generate a “density of road network” or similar surface based on a polyline layer.  One limitation of Esri’s implementation is that it assumes a “Gaussian” or normal distribution to the histogram that defines the kernel, but KDE operations in other implementations sometimes offer alternative distributions, such as quadratic or even custom kernels.  Some implementations even support an “adaptive kernel” that changes in size based on local conditions. It is important to note that Tomlin’s conceptual framework for Focal operations incorporates a much broader set of options than is commonly implemented in GIS software, and FocalDistribution is no exception.  The operation can be modified through the use of the following keywords:

  • The keywords “at DISTANCE by DIRECTION” supports generalization of the normally circular kernel neighborhoods to include alternative kernel shapes like doughnuts and wedges.
  • The “FocalDistribution radiating” operation generalizes even further to include non-linear neighborhoods that include line-of-site.
  • The “FocalDistribution spreading” adds support for travel-cost neighborhoods

The latter couple of options are particularly interesting as they suggest a type of non-Euclidean Kernel Density that could incorporate concept of “friction” through which the density is computed.  In personal correspondence with Tomlin related to Azavea’s NSF-supported research into GPU-based raster computation, Tomlin writes:

To the best of my knowledge, there has never been a non-Euclidean version of Kernel Density: one that would redistribute point quantities in a manner that would account for the “friction” of an underlying terrain. Given a friction layer like [the left figure], for example (where darker shades of grey represent higher frictions), such a surface might look like [the middle figure] rather than [the right figure].

Images courtesy of C. Dana Tomlin.
Tomlin goes on to suggest that this type of operation would be particularly amenable to the distributed computation techniques we developed in our GPU research and are continuing to develop with GeoTrellis. That’s an exciting idea for our GeoTrellis team. The forthcoming 0.8 release does not include non-Euclidean cost-distance operations, but they are a strong contender for new features in 0.9.

Back to the Jackalope

The KDE operation demonstrates that the densest population of jackalopes in Eastern Wyoming, near the North Platte river west of Douglas. Now we can define a study area of manageable scale, can procure the data sets we will need to build our habitat model, and really get in to some map algebra.

In part two of this series, I will explore two focal map algebra functions that act on both an immediate vicinity to a given location and an extended vicinity, as well as a local map algebra function for layer reclassification.

Study Area Map

Kernel Density in GeoTrellis

While my objective for this series is to highlight some of the Map Algebra features we are developing in our open source GeoTrellis framework, I did not show how this would work from a developer’s perspective. If you’re curious, you can find some docs for the Kernel Density and Convolution (a more more general term for raster transformations that use a neighborhood) operations at:

Perhaps one of the engineers on the GeoTrellis team will take this up in a future Azavea Labs article.

* This is a completely fictional and fabricated data set.

** The biologist is also made up.

Webinar Recording: HunchLab 102 & 103

We recently completed the second and third webinars in our HunchLab training series.

The second webinar, HunchLab 102, covered how the Hunches within HunchLab detect localized spikes in crime (an early warning system) and outlines how a user interacts with the application.  While HunchLab applies this process to crime data, the underlying method is applicable to other space-time event data such as real estate transactions, tweets, etc.

 

The third webinar, HunchLab 103,  looks at the risk forecasting techniques within HunchLab and how to interpret their output.  In particular, the software automates two processes.   The first process is identifying short-term elevations in risk due to the contagion effect of crime within a small geographic area — near repeat pattern analysis.   The second process is predicting aggregate incident load based upon various temporal cycles (the day of the week, the time of the year, and the time of the day).

Recorded Webinar: How to Conquer Post-Election Data Chaos with the Cicero API

On Friday, December 14, we hosted “How to Conquer your Post-Election Data Chaos with Cicero,”  a webinar that examined five “data chaos factors” that nonprofits and political advocacy groups are facing in these intermediate few weeks both after the 2012 elections in the US and before the newly-elected officials and new legislative sessions start in January of next year. With Azavea’s political background, and as we were updating the database behind Cicero with the more than 8,000 official records that changed on November 6th, we noticed accelerating trends and saw common pain points many nonprofits are facing just after an election in a redistricting year like 2012. The Cicero API and other techniques and technologies we know like spatial analysis, event registration, and social media can be powerful when applied to nonprofit advocacy work during this chaotic time.

A recording of the webinar is available below. I’ve also posted the recording on YouTube and made the slides available via SlideShare. If you have any questions about the webinar or Cicero, please feel free to email me at athompson@azavea.com.

We’re hoping to hold more webinars and screencasts on Cicero and our other political advocacy services like spatial analysis in 2013! Let me know if you have an advocacy technology issue you think would make a good webinar, or if you’ve tried to use Cicero in the past and would like some screencasts or other tutorials about how to do things with the API. As Azavea’s Community Evangelist, I try hard to be a helpful resource and advocate for any users of the API and always love to hear about websites or apps you’ve built with Cicero or ideas you have. Please, shoot me an email or send me a tweet at @andrewbt!