This post is part of a series of articles written by 2020 Summer of Maps Fellows. Azavea’s Summer of Maps Fellowship Program is run by the Data Analytics team and provides impactful Geospatial Data Analysis Services Grants for nonprofits and mentoring expertise to fellows. To see more blog posts about Summer of Maps, click here.
In 2015, the United Nations adopted 17 Sustainable Development Goals (SDGs), which represented a “universal call to action to end poverty, protect the planet and ensure that all people enjoy peace and prosperity by 2030.” Considering the mission, it’s no surprise that the SDGs are broad and ambitious in their scope. Goal 1, for example, is to “End poverty in all its forms everywhere,” and Goal 3 is to “Ensure healthy lives and promote well-being for all at all ages.”
To track progress on the SDGs, the UN proposed a set of 231 statistical indicators. These indicators range from health outcomes like infant mortality (Indicator 3.2.1), to economic indicators such as the percentage of the population living in poverty (Indicator 1.1.1), environmental metrics like air quality (Indicator 11.6.2), and geospatial questions like access to all-weather roads (Indicator 9.1.1). The indicators’ data-oriented approach helps ground the broad and, in many cases, vague SDGs in more concrete and measurable terms, but acquiring the data necessary to monitor the indicators at national or global scales is a significant and fundamental challenge.
One potential source of geospatial data for monitoring the SDG indicators is OpenStreetMap. OpenStreetMap (OSM) is a free, editable map of the world that is edited and maintained collaboratively. Since OpenStreetMap first launched in 2004, it has undergone tremendous growth, and the OSM community now includes an estimated 6 million registered users who combine to make over 3 million changes and contributions to the map every day. In recent years, OSM has caught the attention of major corporations like Apple and Microsoft, who now hire editors to contribute to the map so that it can be integrated into their products. OSM hosts vast amounts of crowd-sourced geospatial data from around the world, and that data could be a valuable resource for monitoring some of the SDG indicators.
Together with OpenStreetMap US, an organization dedicated to growing the OSM community in the United States, we set out to understand if OpenStreetMap could be a viable source of data for measuring SDG indicators that require geographic data. We used data from OpenStreetMap to measure SDG Indicator 11.7.1 – the average share of the built-up area of cities that is open space for public use – in six American cities (four of which we discuss here). Then, we compared those results to measurements made with similar data acquired from each city’s Open Data site.
We made these comparisons for two reasons. Firstly, we wanted to get a sense of how complete the data on OpenStreetMap is compared to similar, “official” data published by the cities themselves. A significant disparity in the results would suggest that the OSM data is not yet reliable enough to use to measure this indicator, but similarity in the results would imply that OpenStreetMap is a viable source of accurate data. Secondly, we wanted to see how the process for collecting data for several different cities differed depending on the source. If the process for collecting the OSM data scales more easily to additional cities or regions, we might be willing to tolerate some inaccuracy in the overall results.
We started by collecting the data from OpenStreetMap for each city. We referred to the OSM Wiki and compiled a list of tags for features that could be considered “public open or green space” under the SDG Indicator. Generally, we selected features that consisted of open areas accessible to pedestrians, such as parks, dog parks, gardens, nature reserves, ballfields, playgrounds, plazas, and cemeteries. We chose to include cemeteries because these typically include walking areas open to the public, and we excluded golf courses, even though they are large green spaces, because they are often private, and even public courses tend to require payment for entry. Still, these were subjective decisions, and others replicating this analysis may choose a different set of features.
We then attempted to collect similar features from each city’s Open Data website. Whereas the data we collected from OSM was consistent across cities, the data from the Open Data websites varied, with some cities providing more data than others. The most comprehensive data came from cities that provided land use maps with detailed categories, like Baltimore (Open Baltimore) and Philadelphia (OpenDataPhilly), but even these left out some open space features like plazas. Other cities, like Minneapolis and Houston, only provided data on city parks, meaning large features like cemeteries were left out of the analysis.
We then divided the total area for the two sets of features by each city’s land area to calculate the indicator: the proportion of each city that is public open space. We compared the two sets of measurements and visually inspected the two datasets to see which features might be present in one but not the other.
In the following section, we present results from three of the cities we examined: Philadelphia, Minneapolis, and Houston.
The R code used for this analysis can be found on the Summer of Maps GitHub.
In Philadelphia, the two datasets provided very similar results: about 14% of the city is public open or green space according to both the OSM and Open Data features. The most noticeable differences are the golf courses in the northeast and northwest of the city. We deliberately omitted these from the OSM data, because they are not freely accessible by pedestrians, but golf courses are listed as “Active Recreation” in the city’s land use dataset, and they cannot be easily distinguished and separated from ballfields. The OSM data also includes many smaller plazas and parks (including Independence National Historical Park around the Liberty Bell) in the center of the city that are not found in the Open Data dataset.
In Minneapolis, we see a notably higher percentage of public open space from the OSM features (about 12%) compared to the parks data from Minneapolis’ Open Data site (about 10%). The difference comes mostly from the cemeteries, which are present in the OSM data but not the parks data. The parks data also includes two publicly owned golf courses in the north and south of the city, which we omitted from the OSM data, as well as several trails in the west and north that are missing from OSM.
Houston showed a higher percentage of open space in the OSM data (about 7%) compared to the Open Data (5%). Most of this difference is due to the classifications of George Bush Park and Bear Creek Pioneers Park in the west of the city. Checking the satellite imagery for the city shows that both of these areas should be classified as open space, suggesting the percentage of open space in Houston is closer to 10%.
Using data from OpenStreetMap and municipal Open Data websites, we measured the progress made on SDG Indicator 11.7.1 – the average share of the built-up area of cities that is open space for public use – across several American cities. Comparing the two sets of results showed us that the datasets tended to be similar, but both data sources were missing features that could be considered valuable open space for this indicator. Some parks, plazas, and other open spaces could be found in one source but not in the other. Creating the most complete picture of the open space in a city required combining the datasets and using local knowledge to determine which features should be included or excluded.
More practically, however, the collection process for the OSM data required much less effort than the Open Data, and it would scale well to much larger regions, such as entire countries. The OSM data benefits from coming from one source – OpenStreetMap – with data that is recorded with consistent conventions across different parts of the world. Downloading the data was as simple as selecting a list of features and querying OSM using the bounding boxes for each city. Conversely, the collection process for the Open Data required more discretion and manual input. Each of the cities publishes their own Open Data, requiring us to not only navigate to their websites to search for and download the relevant datasets, but also to clean the data to fit as closely as possible to the other datasets. While this process was not too onerous for measuring the SDG in a half dozen cities, it would scale poorly, and performing the same analysis across a larger region like an entire country would require significant effort.
Overall, we learned that OpenStreetMap is a great source of data for tracking this SDG indicator. OSM provides data on public open or green spaces in cities that is similar in quality to more “official” sources like municipal Open Data portals, and collecting the data from OSM requires relatively little effort.