Azavea Atlas

Maps, geography and the web

Announcing the FOSS4G North America 2015 Program Committee

It’s my honor to introduce the Program Committee for FOSS4G North America 2015. The committee represents a variety of views on the geospatial open source community that will really inform the creation of a great program.

Kate Chapman

Kate Chapman is a founder and the Executive Director of the Humanitarian OpenStreetMap Team (HOT), a non-governmental organization dedicated to helping support communities, governments and humanitarian responders in their utilization of OpenStreetMap in crisis response and contingency planning. She was a keynote speaker at FOSS4G 2013 in Nottingham and is currently on the steering committee for FOSS4G Asia, to be held in Bangkok, Thailand in December 2014 (that conference is still accepting abstracts; submit here! http://www.foss4g-asia.org/2014/programmenu/submit-paper/)

Regina Obe

Regina is a member of the PostGIS steering Committee and has been an OSGeo Charter Member since 2009. She is coauthor of PostGIS in Action (Manning Publications, 2011) and co-owner of the Paragon Corporation, a Boston-based consulting firm.

Jody Garnett

Jody is a senior software engineer working with Boundless who is serving on the Project Steering Committee of GeoServer, GeoTools and uDig projects. He has been an OSGeo charter member since 2006 and actively participates in outreach to new projects as a member of both LocationTech and OSGeo.

Andy Petrella

Andy is the Lead Developer at NextLab in Liège, Belgium. He specializes in GIS software development as well as distributed systems, and holds degrees in Mathematics and Informatics with a specialization in Geometrology-Geomatics Sciences from Université of Liège.

Beth Tellman

Beth is a National Science Foundation Fellow and Gilbert White Fellow as a doctoral student in Geography at Arizona State University. She uses geospatial data and software to research how changes in land use impact ecosystem services for flood mitigation. She co-founded and directed an NGO in 2009, the CEIBA Foundation, to facilitate rural Salvadoran communities in attaining disaster resilience. She holds an MESc in Environmental Science from the Yale School of Forestry and Environmental Studies.

Rob Emanuele (Chair)

Rob is a geospatial software developer working at Azavea in Philadelphia. He is the lead of the GeoTrellis project, and currently guiding the project through the LocationTech incubation project.

Summer of Maps: Creating a Cost Distance Surface to Measure Park Access

This entry is part 4 of 6 in the series Summer of Maps 2014

Summer of Maps logo

Now in its third year, Azavea’s Summer of Maps Program has become an important resource for non-profits and student GIS analysts alike.  Non-profits receive pro bono spatial analysis work that can enhance their business decision-making processes and programmatic activities, while students benefit from Azavea mentors’ experience and expertise.   This year, three fellows worked on projects for six organizations that spanned a variety of topics and geographic regions.  This blog series documents some of their accomplishments and challenges during their fellowship.  Our 2014 sponsors, GoogleEsri and PennDesign helped make this program possible.  For more information about the program, please fill out the form on the Summer of Maps website.

 

When you’re navigating the real world, the shortest distance between Point A and Point B is rarely a straight line. Instead, there are twists and turns in the path, difficult terrain, and impassible roadblocks that force you to detour and slow your journey (a lot like life!). This reality has important implications for understanding how close you are to things, whether you’re planning a cross-country road trip or simply walking to your closest neighborhood park.

A common way to measure proximity in a GIS analysis is through using a simple buffer tool, which measures a straight, set distance away from the point, line, or polygon boundary of interest and creates a polygon filling up that created space. This tool has practical applications when you’re looking at basic geographic proximity, but in many cases, measuring straight distance proximity or “as the crow flies” just doesn’t make for the most robust analysis.

One of these cases arose from my work this summer with the Community Design Collaborative to prioritize parks, playgrounds and schoolyards in Philadelphia for potential grants for revitalization or redesign efforts. Among the many factors that were considered for selecting candidate sites was park accessibility. We wanted to visualize where people in Philadelphia are far from existing parks so that we could identify schoolyards or community gardens in these parts of the city that could potentially be redesigned as neighborhood play spaces to serve communities lacking a nearby park. Since we are studying an urban environment and are particularly focused on access for children, we considered walking along sidewalks to be the dominant form of travel to parks.

One option for incorporating park access based on walking would have been to use Network Analyst tools in ArcGIS to develop sophisticated “service areas” for each park using the street network. The issue with this option, in this case, was simply the scale of the operation: with over 400 parks included in our analysis, the service area calculation would have been far too time-consuming and prone to ArcGIS crashes due to system overload. A more manageable way to illustrate park access using the street network was to create a cost distance surface that made traversing along sidewalks to be the preferred method of travel.

To do this, we first examined a dataset of Philadelphia streets, acquired from OpenDataPhilly, and eliminated highways and urban freeways from the data, as these cannot serve as walkable routes to local parks. We then converted the streets dataset into raster format and used the Euclidean Distance tool to create a surface of straight-line distances from each cell to a street. The resulting raster was then classified into a 1 to 6 based on its distance value to represent difficulty of travel, with a value of 1 being easiest to travel and 6 being most difficult. This classification establishes the street network and its sidewalks as the preferred route for travel. The map below shows what this classification looks like for a portion of West Philadelphia:

image1

Our next step was to incorporate the park locations for the cost distance calculation. To both simplify our parks data and simulate access to park entry points, we extracted vertex points from our park polygon dataset and set them to be the “source” dataset, or the locations from which cell distances are measured, in the cost distance operation. We then applied the Cost Distance tool in ArcGIS to measure and record distances between each cell and the nearest park access point using the easiest or “cheapest” route. In other words, the distance of a vertex-to-cell route that covers the easiest terrain (for example, a 1- or 2-scored area on the street’s Euclidean distance layer) will be selected as that cell’s cost distance value, while an alternate route over more difficult terrain will not be (perhaps even if the difficult-terrain route is shorter in straight-line distance). For a more detailed breakdown of the algorithm generating cost distance surfaces, ArcGIS Resources provides explanations and examples of how node-to-node cost distance calculations are performed in the software.

The map below shows the resulting raster illustrating park proximity. You will notice that the edges of each data class appear to hug the street centerlines, showing that routes along streets and sidewalks are preferred to unrealistic routes that cut through city blocks.

image2

With this realistic dataset of park access, we were able to generate a clear picture of the walkable landscape of Philadelphia and identify areas of the city where walking to the park can be a long and tedious trip for children and families. Altogether, in using the cost distance tool, geodesic distances as well as terrain difficulty are incorporated to illustrate real-world proximity and what it really takes to get from Point A to Point B.

Summer of Maps: Raster Versus Vector Visualization

This entry is part 3 of 6 in the series Summer of Maps 2014

Summer of Maps logo

Now in its third year, Azavea’s Summer of Maps Program has become an important resource for non-profits and student GIS analysts alike.  Non-profits receive pro bono spatial analysis work that can enhance their business decision-making processes and programmatic activities, while students benefit from Azavea mentors’ experience and expertise.   This year, three fellows worked on projects for six organizations that spanned a variety of topics and geographic regions.  This blog series documents some of their accomplishments and challenges during their fellowship.  Our 2014 sponsors, GoogleEsri and PennDesign helped make this program possible.  For more information about the program, please fill out the form on the Summer of Maps website.

 

Raster Versus Vector Visualization

As a Summer of Maps fellow I worked with two non-profit organizations: Girlstart in Austin, Texas which empowers girls with Science, Technology, Engineering and Math, and City Harvest in New York City which rescues food all over the city and distributes it to hunger programs. Both wanted to identify areas that are in the most need of their services. Girlstart also wanted to determine areas for fundraising.

One of the tasks for both of my projects was to create composite layers built from different, but related, variables. For example, I made a layer of relative wealth for Austin’s Girlstart that took into account: median home value, educational attainment, and median household income. Since this data was at the census tract level I was working with vector data but actually converted to raster because I thought a surface of wealth would be both intuitive and pleasing to the eye. A couple examples of well-known raster maps are Yelp heat maps or weather maps. I was striving for a similar look and feel.

 

Yelp Philly HipstersThe Weather Channel Current Temperatures

 

 

What’s vector?Vector Raster

“a representation of the world using points, lines, and polygons. Vector models are useful for storing data that has discrete boundaries, such as country borders, land parcels, and streets” (ESRI GIS Dictionary).

What’s raster?

“a representation of the world as a surface divided into a regular grid of cells. Raster models are useful for storing data that varies continuously, as in an aerial photograph, a satellite image, a surface of chemical concentrations, or an elevation surface” (ESRI GIS Dictionary).

 

 

 

Wealth

This is the visualization of wealth in raster format.

 

It definitely wasn’t quite as beautiful as I had hoped, nor quite as meaningful. I thought it would provide a nice smooth surface across Central Texas and show more detail by being a stretched gradient.  Instead it just looks like really fuzzy tract boundaries. This is because my data attributes were not continuous. They are polygons and quite large polygons at that. When rasterized, the values in the cells are all the same within each polygon which doesn’t signify much. The process of rasterizing did not add any additional information or aesthetics. The vector format below is the better choice. It looks neat and is appropriately symbolized by a color gradient. The tract boundaries are distinct and the wealth ranking is distinguished across the features.

 

Wealth 2

This is the visualization of wealth in vector format.

 

Recall that I created composite layers for both of my projects. For City Harvest I made a combined layer of vulnerability based on the percent of people living below the poverty threshold and the percent of people receiving SNAP benefits. It was a very similar task and used census data at the census tract level again. When I made a density raster, however, this is what happened.

 

Density Raster


The raster looks significantly different, and better, than the Girlstart raster. It is successful because this surface conveys information in a different and effective way.  That is, a more continuous surface shows the patterns in a smooth fashion. The data is from census tracts just like with Girlstart, but the actual size of the polygons in New York are much smaller than those in Austin.  That translates to more ‘pieces’ (and more data) to visualize.

 

Scale and size played a major role in whether to use raster or vector for me, but there are a couple other criteria to consider. While both my datasets were in vector to start, one should recognize how data is originally formatted as a good hint as to what may be appropriate. This has a lot to do with context. Just as the definitions referenced, certain topics lend themselves to one or the other. My starting demographic topics make a lot of sense as vector because census information is gathered from people who live in places that are normally categorized into geographic regions like counties and states. Other subjects like environmental monitoring are often rasters because, much like the real world, the earth is a continuous surface. Of course these are simply general guidelines. It’s all about how you perceive the data and want to visualize it. That last part is key. My first Girlstart raster simply didn’t look right because the unit of analysis (census tracts) was too large to visualize complex variation in the data.

 

Through my experience I’ve determined four recommendations that are good starting points to consider when contemplating between raster and vector.

  • Scale and size of features
  • Original formatting
  • Context
  • Aesthetics

Summer of Maps: Lessons in Cartography

This entry is part 2 of 6 in the series Summer of Maps 2014

Summer of Maps logo

Now in its third year, Azavea’s Summer of Maps Program has become an important resource for non-profits and student GIS analysts alike.  Non-profits receive pro bono spatial analysis work that can enhance their business decision-making processes and programmatic activities, while students benefit from Azavea mentors’ experience and expertise.   This year, three fellows worked on projects for six organizations that spanned a variety of topics and geographic regions.  This blog series documents some of their accomplishments and challenges during their fellowship.  Our 2014 sponsors, GoogleEsri and PennDesign helped make this program possible.  For more information about the program, please fill out the form on the Summer of Maps website.

 

Lessons in Cartography

Summer of Maps focuses on providing spatial analysis services to non-profits in the form of maps.

While I geocoded addresses, performed kernel densities, and converted between vector and raster, none of that means anything unless my maps effectively convey the content. That is, they need to make sense and look awesome. Depending on the ‘where’ and ‘what’ of my maps, I implemented various tips and tricks to make them both beautiful and understandable. One of my mentors, John Branigan, is quite the cartography guru and actually inspired this blog post so I will start with:

 

1. Lessons from John – When I would bring up a map for John to review, before I had even explained anything he would point out “it’s not projected” or “don’t use red.” His eye for detail is very acute. The following are small tips that make a big difference for the aesthetics of a map.

  • Color – Red can stand out and provide contrast but it also conveys feelings of negativity or danger. On the other hand, green gives off a positive connotation. When I used green for a layer of income I thought it was appropriate because it reminded me of money. As Cynthia Brewer says in Designing Better Maps, “darker colors are used to represent higher data values, and lighter colors represent lower values” (1161). Emphasizing poverty this way, however, does not make sense because the lighter green areas were the ones I wanted to focus on. Instead, I used more of an orange/red to emphasize impoverished census tracts and used a green gradient to emphasize wealth for a fundraising map.
  • Outlines – One of my projects was about New York City, a densely-populated place. My maps could have easily been overwhelmed by the sheer number of census tracts. I avoided this by removing the tract’s outline. This left just the colored polygons, free of distracting boundary lines.
  • Transparency – Another way I was able to convey a lot of information without it getting too “busy” was with transparency. Making a layer 50% transparent lessens the harshness from a strong color while also giving way for other layers to be seen.
  • Basemaps – Basemaps are nice because they give some geographical context. They can also add a lot more like streets, topography, and satellite images. Again, for a place like New York with countless streets, I chose a very simple basemap with minimal labels that would adjust to the scale.

 

2. Multiple Attribute Symbology – Most of the time a layer is symbolized either by color or size or shape. There are certain instances, however, when you need to show multiple attributes. I came across this for a couple of my maps. For example, I needed to map event locations by the type of event and number of participants. That is, showing a quantity and category simultaneously. I did this by using unique colors for the event type and graduated size for the participation.

 

3. Labels – It can be important to identify specific features like streets and counties with labels. Unfortunately they can pose problems such as long names, overlapping and simply not fitting where you want them. I counteracted all these complications with the following tactics.

  • Truncate the Label – A layer of community districts was identified by the district code. The code was comprised of a number for the borough and then the district number. I removed the leading number so the label would just be the shortened version.
  • Convert to Annotation – This is a great trick that allows the labels to be manually edited and moved. I was able to place county names where they didn’t overlap other features and rotated them so they fit nicely.
  • Omitting – Using the “one label per feature” instead of “one label per feature part” drastically de-cluttered the Jamaica Bay islands.

 

4. Extent – Simplify a map by only showing what needs to be shown. I had specific study areas for my projects so any layers that spilled out like a highway were clipped away. Similarly, I removed parts of a density layer that overlapped water.

 

5. Multiple Data Frames – Including more than one data frame in a map can add detailed views and context.

  • Insets – Maybe there’s an area of the map that is very clustered or is of particular interest. Creating an inset map of the zoomed-in extent avoids squinting!
  • Context Map – A few of my layers were derived from analysis involving various datasets. Instead of writing out many sentences about the process, I made a layout with multiple data frames to visually explain how many layers were added together to produce the main one on display.

 

 

 

Summer of Maps: An ‘Atypical’ Approach to Analyzing Tree Canopy Cover

This entry is part 1 of 6 in the series Summer of Maps 2014

Summer of Maps logo

Now in its third year, Azavea’s Summer of Maps Program has become an important resource for non-profits and student GIS analysts alike.  Non-profits receive pro bono spatial analysis work that can enhance their business decision-making processes and programmatic activities, while students benefit from Azavea mentors’ experience and expertise.   This year, three fellows worked on projects for six organizations that spanned a variety of topics and geographic regions.  This blog series documents some of their accomplishments and challenges during their fellowship.  Our 2014 sponsors, Google, Esri and PennDesign helped make this program possible.  For more information about the program, please fill out the form on the Summer of Maps website.

 

An ‘Atypical’ Approach to Analyzing Tree Canopy Cover

As part of Azavea’s Summer of Maps program, I elected to work with TreePeople, a Los Angeles based environmental non-profit with a goal of ensuring the sustainable future of L.A. by expanding and maintaining the city’s tree canopy. I have always had an interest in sustainability, so I jumped at the chance to work with such an environmentally-focused organization. TreePeople was seeking GIS analysis to further understand the relationship between public health and socioeconomic characteristics in the city of Los Angeles with the city’s tree canopy cover (TCC). Their aspirations for the project were to understand in which neighborhoods of L.A. they should focus their resources, according to the correlation. Before I even began looking for pertinent data sources, I performed some rudimentary research on the health and societal benefits of trees, especially on the West Coast and in a city infamous for its air pollution problems.   I was pleasantly surprised to learn that many scientists have found evidence stating that trees can help people to live healthier, happy lives. A recent article published in The Atlantic found that the U.S. TCC averts $6.8 billion in health care costs each year, simply by existing.

 

Besides their environmental effects, in which trees remove carbon from the atmosphere – they are often used as offsets for carbon emissions – many scientists understand trees to have positive psychological effects on the human population, including decreased depression due to the presence of greenery, and decreased crime rates. In their publication in Landscape and Urban Planning, Troy, Grove, and O’Neil-Dunne (2012) found that in Baltimore, a 10% increase in TCC was associated with a 12% decrease in crime. While some scholars are not fully on board with some of these findings, more and more evidence is pointing towards the positive aspects of having trees in our residential and urban centers.   Due to the fact that Los Angeles is stricken with severe air pollution problems, one can expect that certain respiratory conditions, like asthma, are very prevalent in the region. In a world where 7 million people die from air pollution every year, according to the World Health Organization’s March 2014 publication, it is crucial that all measures to reduce pollutants are taken, which includes planting trees. Taken from the above listed The Atlantic article, scientists found that in 2010, the presence of trees in the United States prevented 850 human deaths and nearly 670,000 cases of acute respiratory symptoms, like asthma. If more trees were to be planted, a situation similar to the crime correlation in Baltimore could potentially result.   Using this information on what variables are thought to be directly influenced by the presence of trees, we began searching for data specific to the Los Angeles region. Using the data that was provided by TreePeople and finding our own on various websites, such as the site of the California Office of Environmental Health Hazard Assessment (OEHHA), we were able to obtain numerical figures for 10 variables, broken down into smaller geographic areas of the city of Los Angeles, including census tract and health district levels. With the help of TreePeople, we ranked our variables according to their importance and proposed correlation to TCC:

Variable Name

Weighting

Asthma

15%

Diabetes

15%

Obesity

15%

Minority races

10%

Traffic density

10%

Linguistic isolation

10%

Poverty

10%

Unemployment

5%

Low birth weight

5%

Educational attainment

5%

Using the assigned weights, we used the Weighted Sum tool in ArcMap 10.2 to mathematically combine all of the rasterized factors into one layer and determine which parts of the city had the highest need – where the highest values (out of 100) were found. What resulted was the following, where the dark reddish-brown color indicates the priority areas with the highest need.

Weighted Sum map

 

From this map, we were able to determine the neighborhoods in the city that contained the highest values, broken up by neighborhood council delineation, as established by the Los Angeles government. TreePeople will use these designations to determine where they want to focus their efforts to increase tree canopy by planting trees and increasing awareness. We found that the top 5 ranked neighborhoods were all located in South Los Angeles and Downtown Los Angeles, but that there were two neighborhoods in the Panorama City and Winnetka areas that also demonstrated a high need for tree plantings, which we chose to study as well.

Top 5 neighborhoods

Finally, once we had reached this point, it was time to add in the original tree canopy cover data that we had been given by TreePeople. One might believe that starting with this data set would have been the most logical way to tackle this GIS project, but I find that understanding the situation in more depth and then using the tree canopy cover to confirm the findings produces a more comprehensive result.

TCC

While analyzing solely TCC does not pick out the same 5 neighborhoods as an analysis of correlated variables does, the 5 neighborhoods that we had originally selected are on the lower spectrum of TCC, and provide TreePeople with different neighborhoods to investigate that they otherwise may have missed out on by simply analyzing the map of TCC. Using this ‘different’ method to complete the project shows that one’s first instinct is not the only way to take on a project, and that more communities may benefit in the long run from an atypical analysis. It is important to look at a project from multiple dimensions in order to fully understand and complete it.