Azavea Atlas

Maps, geography and the web

Five Ways Your Foundation or Nonprofit Can Get Started With Spatial Analysis

Taking advantage of new technology in your foundation or nonprofit can sometimes be a difficult process. Fortunately, the Knight Digital Media Center (KDMC) hosts a series of workshops across the US for nonprofit foundations to learn about technology. I was able to speak at the most recent KDMC workshop in Charlotte, North Carolina, along with Amy Gahran, independent journalist, Dan X. O’Neil, Executive Director of the Smart Chicago Collaborative, and Sarah K. Goo, creator of the Pew Research blog Fact Tank.

In my workshop I talked about five ways nonprofits and foundations can get started with spatial analysis, and I used three case studies from Azavea’s Summer of Maps program to help.

The bicycle crash maps I made last summer for the Bicycle Coalition of Philadelphia fostered a public conversation about crash reporting policies, raising awareness about an issue the Bike Coalition cared about. The spatial analysis work I completed for the Greater Philadelphia Coalition Against Hunger as a Summer of Maps Fellow helped amplify their door-to-door efforts and target their limited advertising dollars. Finally, 2013 Summer of Maps Fellow Lena Ferguson’s work for DVAEYC had a huge impact, helping win one million dollars in grant money to improve early childhood education programs in Philadelphia.

Your foundation or nonprofit may be interested in taking advantage of spatial analysis to raise your profile, amplify your message, and target your efforts. Here are five steps you can take to get started:

  1. Take the Maps and the Geospatial Revolution class at Coursera to learn more about spatial analysis. The course is online and free, and takes about five weeks to complete. Have a data analyst in your organization take it too.
  2. Collect address-level data about every interaction your organization has with clients and the public, because addresses need to be in a specific format to be put on a map. Here are some best practices for preparing and maintaining your organization’s address data.
  3. Check out TechSoup to find discounted licenses for ArcGIS, or download the free and open-source alternative, QGIS. You may not be ready to use a GIS desktop software now, but having one on hand will enable an analyst, consultant, or intern to get started working on spatial analysis right away.
  4. Check out the presentation, sample maps, and some resources I collected for the workshop with this Bitbucket of links.
  5. Tell each of your grantees (if you’re a foundation) to do steps 1-4.

The five steps above should help your organization leverage the power of spatial analysis. You may already have questions about your data. If so, consider applying to the Summer of Maps program. Your organization will have a chance to receive pro-bono spatial analysis from Azavea-mentored Fellows. If you’d like to learn more about spatial analysis or Summer of Maps send me an email at tdahlberg@azavea.com.

Summer of Maps: Daytime Population Estimation and its Effect on Risk Terrain Modeling of Crime

This entry is part 6 of 6 in the series Summer of Maps 2014

Summer of Maps logo

Now in its third year, Azavea’s Summer of Maps Program has become an important resource for non-profits and student GIS analysts alike.  Non-profits receive pro bono spatial analysis work that can enhance their business decision-making processes and programmatic activities, while students benefit from Azavea mentors’ experience and expertise.   This year, three fellows worked on projects for six organizations that spanned a variety of topics and geographic regions.  This blog series documents some of their accomplishments and challenges during their fellowship.  Our 2014 sponsors, GoogleEsri and PennDesign helped make this program possible.  For more information about the program, please fill out the form on the Summer of Maps website.

 

When using Census data for research or analysis, sometimes the standard total population count for a region just doesn’t suffice. Transportation planners and crime analysts, for example, must account not only for residential populations but also “daytime” or “commuter-adjusted” population data, since many people spend most of their days working or running errands in different Census tracts, different towns, or even different regions from their homes. Nowadays we’re always on the go, so shouldn’t our population data reflect that?

I encountered this daytime population issue this summer while working as a Summer of Maps fellow with DataHaven to analyze the geographies of crime risk in New Haven, Connecticut. In this project, we used the Risk Terrain Modeling Diagnostics (RTMDx) Utility, a software application that uses crime data and the spatial influences of potential “risk factors” to model where conditions are ripe for crimes to occur in the future. One of the influencing factors of crime we used in our analysis was population density. While the exact effect of population density on crime rates is the focus of ongoing criminology research, in our study, we proposed that crimes would occur where high volumes of people were located. Since our focus here was on where people are “located” and not necessarily where they “live,” we incorporated commuter-adjusted population estimates to account for New Haven’s daytime population.

To acquire daytime population estimates, some data assembly is required. The Census Bureau provides instructions on calculating daytime population estimates using American Community Survey or Census 2000 population data. My first step in calculating daytime population was to download workplace geography data from Census Transportation Planning Products (CTPP), which includes Census data particularly useful to transportation planners. I selected “2006-2010 CTPP Tract-Tract work flow data” and followed the download instructions to get tract-to-tract population flow counts from residences to workplaces. I then queried the Access database to extract all records including a residence tract OR a workplace tract located in Connecticut to account for interstate commuting. With these statewide commuter counts, I was able to hone in on New Haven Census tracts and calculate the total number of workers working and/or living in New Haven. Lastly, I used the Census Bureau’s “Method 2” for calculating daytime population:

Total resident population + Total workers working in area – Total workers living in area

With both resident and commuter-adjusted population counts available, the next stage of the analysis was to incorporate this data into the RTMDx. I created risk terrain surfaces across four crime types (robbery, burglary, simple assault, and assault with a dangerous weapon) and two population counts (resident and daytime populations), producing eight risk maps in total. Each risk terrain model (RTM) included five risk factors in addition to the population count: foreclosures, bus stops, schools, parks, and job locations related to retail, entertainment, and food service (provided by Census LODES data via DataHaven).

In the figures below, we can compare the different risk terrain surfaces created for assaults using resident population and daytime population. The risk terrain surfaces are displayed with the “relative risk score” produced by the RTMDx Utility. To interpret this map, if an area has a risk value of 50, this means that the expected rate of assault in this location is 50 times higher than an area with a score of 1. The higher the score, the greater the risk of an assault based on the model.

image1

image2

 

In comparing the geographies of crime risk between resident and daytime population counts in central New Haven, we see generally higher risk scores when resident population is modeled. The heavily residential neighborhoods surrounding downtown New Haven see greater risk scores with resident population, perhaps owing to the fact that many residents here commute to jobs in other neighborhoods or cities during the day. Alternatively many of these neighborhoods, including Fair Haven, Newhallville, and Edgewood, see sharply increased risk scores when resident population is considered. The effect of population is more difficult to gauge in downtown New Haven, which is dominated by Yale University and Yale-New Haven hospital, the city’s two largest employers. Despite a much larger daytime than resident population, assault risk scores decreased when accounting for daytime population. This could be due to the nature of assault crimes in relation to population density, the geography of assault incidents in our crime dataset, the role of uncounted university students in influencing assault patterns, or other issues. Our results demonstrate that while daytime population is an important element to consider in risk terrain modeling, crime risk analysis remains a complex and inexact science.

While some spatial analyses may not require the granularity of daytime population estimates, using commuter-adjusted population data has important implications when exploring time-sensitive phenomena like crime or transportation dynamics. The Census may not be able to account for population spikes associated with university students, tourism or shopping, CTPP data still gets closer to understanding where people spend their days outside of the home.

Summer of Maps: A Spatial Tale of Five Cities

This entry is part 5 of 6 in the series Summer of Maps 2014

Summer of Maps logo

Now in its third year, Azavea’s Summer of Maps Program has become an important resource for non-profits and student GIS analysts alike.  Non-profits receive pro bono spatial analysis work that can enhance their business decision-making processes and programmatic activities, while students benefit from Azavea mentors’ experience and expertise.   This year, three fellows worked on projects for six organizations that spanned a variety of topics and geographic regions.  This blog series documents some of their accomplishments and challenges during their fellowship.  Our 2014 sponsors, GoogleEsri and PennDesign helped make this program possible.  For more information about the program, please fill out the form on the Summer of Maps website.

 

One of my two projects this summer as a fellow here at Azavea was working with CBEI, the Consortium for Building Energy Innovation, located here in Philadelphia. CBEI had an interest in understanding a national view on building energy use and the potentials for reducing energy consumption in five cities: Philadelphia, Washington, D.C., New York City, Minneapolis, and San Francisco. Over the last few years, cities and states have passed benchmarking and disclosure laws that require the owners of buildings – commercial, municipal, private, public, and non-residential – to report their annual energy use. Due to this recent wave of published data, the potential now exists for a comprehensive analysis across many cities. This project is certainly one of the first of its kind, since the data were so recently published, and it has already garnered interest from various city governments who are interested in learning of the results and the processes utilized throughout the project.

Comparing five cities is not always an easy task, however. Each of the five cities that we investigated for this project are located in different parts of the U.S., have different sizes, and have a different amount of census tracts – a common subdivision of a county with a population size between 1,200 and 8,000 people, generally. Most importantly, the benchmarked building stock for each city was not the same. While we initially embarked on the project with the goal of mapping benchmarked commercial buildings, we were only able to find one published data set for this type, simply because they have not yet been made public. See the following table for a description of which type of building stock we analyzed for each city:

 

City Name Type of Buildings (number)
Philadelphia Commercial (1,171)
New York City Non-Residential (2,240)
Washington, D.C. Private (490)
Minneapolis Public (101)
San Francisco Municipal (431)

 

Although there existed some overlap in the types of buildings analyzed for each city, is important to take note that each city published a different set of data that otherwise might not have been compared, if more data had been available. Additionally, the data sets were self-reported by the buildings involved, which is an indicator that errors may – and do – exist. Nonetheless, we set forth to break down each city’s building stock by five key variables related to energy efficiency: greenhouse gas (GHG) emissions, weather normalized source energy use intensity (EUI), ENERGY STAR score, building size, and year built. We chose these five variables because we found them to be good representations of energy efficiency and they were largely available for each of the five cities – with the exception of year built for NYC. Mapping each city individually was simple and informative. We generated one map per variable, per city, such as the one below.

GHG_PHL

As part of our analysis we wanted to create composite maps of both the cities and the variables in order to more directly compare and contrast. When it came the time to place 5 cities, showing the same variable, on the map, we noticed one major flaw: that the legend values were totally different, because they were each representing their own city, and therefore the highest category of values for one city could fall in the middle of the range for a different city. In order to correct this problem, we created an excel spreadsheet with every city’s values for each variable, and used the quantile tool to correctly distribute the values, accounting for each city. Quantiles are a great way to represent a dataset, because rather than dividing the data by arbitrary intervals, it separates the data by 25%: 0-25%, 25-50%, 50-75%, 75-100%. I initially thought that this method would not be a good way to represent the data, as some of the highest values for particular cities are so much larger than the highest values of a smaller city, like Minneapolis, but because quantiles use percentages and not intervals, it turned out to be a great way to display the data accurately.

EUI_Composite

 

While it can be tricky to display cities of different sizes, showing slightly different information, on one map, it is worth the effort to normalize all of the data in order to accurately make comparisons. It can be truly fascinating to see how two cities, thousands of miles apart, relate in terms of energy efficiency – how they are similar and how they are different. I hope that this project will lead to similar initiatives in the future to improve energy efficiency and reduce costs, especially as more data sets become benchmarked and released to the public.

Announcing the FOSS4G North America 2015 Program Committee

It’s my honor to introduce the Program Committee for FOSS4G North America 2015. The committee represents a variety of views on the geospatial open source community that will really inform the creation of a great program.

Kate Chapman

Kate Chapman is a founder and the Executive Director of the Humanitarian OpenStreetMap Team (HOT), a non-governmental organization dedicated to helping support communities, governments and humanitarian responders in their utilization of OpenStreetMap in crisis response and contingency planning. She was a keynote speaker at FOSS4G 2013 in Nottingham and is currently on the steering committee for FOSS4G Asia, to be held in Bangkok, Thailand in December 2014 (that conference is still accepting abstracts; submit here! http://www.foss4g-asia.org/2014/programmenu/submit-paper/)

Regina Obe

Regina is a member of the PostGIS steering Committee and has been an OSGeo Charter Member since 2009. She is coauthor of PostGIS in Action (Manning Publications, 2011) and co-owner of the Paragon Corporation, a Boston-based consulting firm.

Jody Garnett

Jody is a senior software engineer working with Boundless who is serving on the Project Steering Committee of GeoServer, GeoTools and uDig projects. He has been an OSGeo charter member since 2006 and actively participates in outreach to new projects as a member of both LocationTech and OSGeo.

Andy Petrella

Andy is the Lead Developer at NextLab in Liège, Belgium. He specializes in GIS software development as well as distributed systems, and holds degrees in Mathematics and Informatics with a specialization in Geometrology-Geomatics Sciences from Université of Liège.

Beth Tellman

Beth is a National Science Foundation Fellow and Gilbert White Fellow as a doctoral student in Geography at Arizona State University. She uses geospatial data and software to research how changes in land use impact ecosystem services for flood mitigation. She co-founded and directed an NGO in 2009, the CEIBA Foundation, to facilitate rural Salvadoran communities in attaining disaster resilience. She holds an MESc in Environmental Science from the Yale School of Forestry and Environmental Studies.

Rob Emanuele (Chair)

Rob is a geospatial software developer working at Azavea in Philadelphia. He is the lead of the GeoTrellis project, and currently guiding the project through the LocationTech incubation project.

Summer of Maps: Creating a Cost Distance Surface to Measure Park Access

This entry is part 4 of 6 in the series Summer of Maps 2014

Summer of Maps logo

Now in its third year, Azavea’s Summer of Maps Program has become an important resource for non-profits and student GIS analysts alike.  Non-profits receive pro bono spatial analysis work that can enhance their business decision-making processes and programmatic activities, while students benefit from Azavea mentors’ experience and expertise.   This year, three fellows worked on projects for six organizations that spanned a variety of topics and geographic regions.  This blog series documents some of their accomplishments and challenges during their fellowship.  Our 2014 sponsors, GoogleEsri and PennDesign helped make this program possible.  For more information about the program, please fill out the form on the Summer of Maps website.

 

When you’re navigating the real world, the shortest distance between Point A and Point B is rarely a straight line. Instead, there are twists and turns in the path, difficult terrain, and impassible roadblocks that force you to detour and slow your journey (a lot like life!). This reality has important implications for understanding how close you are to things, whether you’re planning a cross-country road trip or simply walking to your closest neighborhood park.

A common way to measure proximity in a GIS analysis is through using a simple buffer tool, which measures a straight, set distance away from the point, line, or polygon boundary of interest and creates a polygon filling up that created space. This tool has practical applications when you’re looking at basic geographic proximity, but in many cases, measuring straight distance proximity or “as the crow flies” just doesn’t make for the most robust analysis.

One of these cases arose from my work this summer with the Community Design Collaborative to prioritize parks, playgrounds and schoolyards in Philadelphia for potential grants for revitalization or redesign efforts. Among the many factors that were considered for selecting candidate sites was park accessibility. We wanted to visualize where people in Philadelphia are far from existing parks so that we could identify schoolyards or community gardens in these parts of the city that could potentially be redesigned as neighborhood play spaces to serve communities lacking a nearby park. Since we are studying an urban environment and are particularly focused on access for children, we considered walking along sidewalks to be the dominant form of travel to parks.

One option for incorporating park access based on walking would have been to use Network Analyst tools in ArcGIS to develop sophisticated “service areas” for each park using the street network. The issue with this option, in this case, was simply the scale of the operation: with over 400 parks included in our analysis, the service area calculation would have been far too time-consuming and prone to ArcGIS crashes due to system overload. A more manageable way to illustrate park access using the street network was to create a cost distance surface that made traversing along sidewalks to be the preferred method of travel.

To do this, we first examined a dataset of Philadelphia streets, acquired from OpenDataPhilly, and eliminated highways and urban freeways from the data, as these cannot serve as walkable routes to local parks. We then converted the streets dataset into raster format and used the Euclidean Distance tool to create a surface of straight-line distances from each cell to a street. The resulting raster was then classified into a 1 to 6 based on its distance value to represent difficulty of travel, with a value of 1 being easiest to travel and 6 being most difficult. This classification establishes the street network and its sidewalks as the preferred route for travel. The map below shows what this classification looks like for a portion of West Philadelphia:

image1

Our next step was to incorporate the park locations for the cost distance calculation. To both simplify our parks data and simulate access to park entry points, we extracted vertex points from our park polygon dataset and set them to be the “source” dataset, or the locations from which cell distances are measured, in the cost distance operation. We then applied the Cost Distance tool in ArcGIS to measure and record distances between each cell and the nearest park access point using the easiest or “cheapest” route. In other words, the distance of a vertex-to-cell route that covers the easiest terrain (for example, a 1- or 2-scored area on the street’s Euclidean distance layer) will be selected as that cell’s cost distance value, while an alternate route over more difficult terrain will not be (perhaps even if the difficult-terrain route is shorter in straight-line distance). For a more detailed breakdown of the algorithm generating cost distance surfaces, ArcGIS Resources provides explanations and examples of how node-to-node cost distance calculations are performed in the software.

The map below shows the resulting raster illustrating park proximity. You will notice that the edges of each data class appear to hug the street centerlines, showing that routes along streets and sidewalks are preferred to unrealistic routes that cut through city blocks.

image2

With this realistic dataset of park access, we were able to generate a clear picture of the walkable landscape of Philadelphia and identify areas of the city where walking to the park can be a long and tedious trip for children and families. Altogether, in using the cost distance tool, geodesic distances as well as terrain difficulty are incorporated to illustrate real-world proximity and what it really takes to get from Point A to Point B.