Analyzing OpenTreeMap Data with PostGIS

In this blog, I’ll be continuing the work I started in Loading Spatial Data into PostGIS with QGIS. Last time I talked about how QGIS could be used as a means of loading data into PostGIS databases quickly with the QGIS DB Manager and Boundless’ OpenGeo Suite.

This time we’ll be using PostGIS to investigate the distribution of mapped trees in Edmonton’s OpenTreeMap site, yegTreeMap, with simple PostGIS queries. Each city that participates in OpenTreeMap has access to a wealth of insights. All it takes to unlock them is a spatial database and a few lines of code.

Requirements

This tutorial assumes that you have the OpenTreeMap data for Edmonton in your PostGIS database. My previous blog entry in this series should get you up to speed downloading the data and loading it into PostGIS.

Note that while the file and field names can change, just about any overlapping point and polygon data will suffice for these queries–it just has to be in a PostGIS database. This also means that you may use CartoDB for the analysis since it stores its data in PostGIS.

Getting Started in QGIS

First, make sure your PostGIS database is running. Usually that just involves running the PostGIS executable on your machine. Next, open up ‘DB Manager’ in QGIS. If you’re using CartoDB to follow along, just open the SQL tab. In QGIS, open up the DB Manager:

Once in DB Manager, click the SQL Query Icon (a paper with a wrench) and it will open the query manager. The top empty box is where you write SQL queries, and the bottom box shows the results.

Spatial Queries with PostGIS

Each of the following queries uses what’s known as a spatial join. Spatial joins append information from one layer to another based on overlap, and are one of the key features of a spatial database. In a way, the shared key for the join is space rather than an ID field. The queries below assume you have two different datasets in your database: Edmonton trees, and neighborhoods.

Count Trees and Display by Neighborhoods

We’ll start out with this simple query to count up the number of trees per neighborhood using the standard SQL COUNT() function. The spatial part is the PostGIS function ST_Intersects(). ST_Intersects() looks for overlap in the geometry, which is stored in a field called the_geom. CartoDB uses an additional geometry field called the_geom_webmercator. CartoDB’s documents explain more about how to use that geometry field. Using the top SQL window, run the following query:

SELECT hoods.hoodname,
COUNT(trees.gid) AS tree_count
FROM edmonton_trees AS trees
JOIN neighborhoods AS hoods
ON ST_Intersects(trees.the_geom, hoods.the_geom)
GROUP BY hoods.hoodname
ORDER BY tree_count DESC

The query should result in a table of the number of trees per neighborhood in descending order:

We can see that the neighborhood with the most overall trees is Summerside, Followed by River Valley Rundle, and then Terwillegar Towne.

Calculate Tree Density Per Neighborhood

Finding the number of trees per neighborhood is interesting, but larger neighborhoods might contain more trees by virtue of their size. This next query will normalize the number of trees by each neighborhood’s area. COUNT() is used to divide the number of each neighborhood’s trees by its area, and then ST_Intersects() performs the spatial join. This results in trees per square kilometer, which in this query has an alias called trees_km2.

SELECT
hoods.gid,
hoods.the_geom,
hoods.hoodname,
COUNT(trees.gid) / hoods.area_km2 AS trees_km2
FROM edmonton_trees AS trees
JOIN neighborhoods AS hoods
ON ST_Intersects(trees.the_geom, hoods.the_geom)
GROUP BY hoods.hoodname, hoods.gid, hoods.the_geom
ORDER BY trees_km2 DESC

The resulting table shows that the top two most tree-dense areas are Virginia Park, and Mill Woods Park with 3,100 and 2,693 trees per square kilometer, respectively. Summerside and River Valley Rundle are no longer in the top ten.

Find the Widest Tree by Neighborhood

Next, let’s try refining our queries a bit and find distinct features within each neighborhood. What if rather than summarizing all the trees in the neighborhood, we wanted to find only the widest tree in each neighborhood? This query once again employs ST_Intersects(). The standard SQL MAX() function finds the widest tree from the diameter field.

SELECT
hoods.hoodname,
trees.species__c AS treespecies,
MAX(trees.diameter) AS tree_diam
FROM edmonton_trees AS trees
JOIN neighborhoods AS hoods
ON ST_Intersects(trees.the_geom, hoods.the_geom)
GROUP BY hoods.hoodname, hoods.gid, hoods.the_geom, trees.species__c
ORDER BY tree_diam DESC

The resulting table shows that the largest single tree of any species in the OpenTreeMap database for Edmonton is a Willow tree in Westbrook Estates. Willows, Maple, Elm, and Ash round out the rest of the largest trees.

Widest Tree By Neighborhood By Species

Let’s take the previous query one step further. What if rather than the single widest tree in each neighborhood, we wanted the single widest tree of every species present in each neighborhood? This query places ST_Intersect() in a sub-query. Type it into the SQL editor and run it. It may take 10-20 seconds to complete. Note that we’re also selecting everything (‘*‘ is SQL for everything), so the table that is returned has all the original fields.

SELECT *
FROM edmonton_trees,
(SELECT
MAX(trees.gid) AS treeid,
hoods.hoodname,
trees.species__c,
MAX(trees.diameter) AS tree_diam
FROM edmonton_trees AS trees
JOIN neighborhoods AS hoods
ON ST_Intersects(trees.the_geom, hoods.the_geom)
GROUP BY hoods.hoodname, trees.species__c) AS sub
WHERE edmonton_trees.gid = sub.treeid

The resulting table should have around 8,600 records. While species will repeat across neighborhoods in the table, they will not repeat within the same neighborhood. The resulting data also has a secondary function: by finding the widest tree of each unique species per neighborhood, the result can act as a proxy for the tree biodiversity within each neighborhood. Azavea’s OpenTreeMap team discusses this more in their blog post, Seeing the Forest for the Trees: Interpreting Data from Tree Inventories (Part One).

Export Query Results as Maps

Next, let’s export the maps from the PostGIS database in QGIS. The SQL Window in the DB Manager of QGIS allows users to export query results directly into QGIS as a new layer. Run one of the queries above and then look for the button labeled ‘Load as new layer’ below the result pane. This should expand a menu.

Click the ‘Retrieve columns’ button on the right and it should automatically grab the gid and the_geom. Finally, give the layer a name, and then press ‘Load now!’ This should add the layer to QGIS’ table of contents.

If it doesn’t, you can work around this issue by using pgAdmin, which should have come with OpenGeoSuite in the prior installment of this guide. Use pgAdmin to navigate to connect to your PostGIS database and use the SQL editor to perform one of the queries above. Test that it works, and then run it with the ‘Execute Query, Write Result to File’ option. This should save the resulting table as a CSV. This CSV can then be loaded back into QGIS, ArcGIS, or CartoDB as a spatial layer.

Style a Map

Once you’ve managed to either load your query result as a new layer in QGIS, or export the resulting query as a CSV with pgAdmin, you should make a map and show off your hard work! For example, I loaded and styled the data in CartoDB for the OpenTreeMap team:

Next Steps

These queries only scratch the surface of what PostGIS can do. PostGIS can reduce a traditionally labor-intense GIS task to a few lines of code, which can be reused over and over. Each of the code snippets above can be quickly repurposed and reused for just about any point and polygon data.

If you’d like to take your PostGIS a bit further, Chris Whong of CartoDB recently published a blog about PostGIS queries that replicate many common GIS geoprocessing tasks present in QGIS. Try using PostGIS on your own machine or in CartoDB and see how much time you can save with common geographic summary and processing tasks.