Azavea Atlas

Maps, geography and the web

Meet Hopeworks: the Camden nonprofit offering GIS and web development training to youth

Hopeworks Logo, skyline, world, and mouseRecently, I was invited to tour Hopeworks ‘N Camden and had the opportunity to meet Executive Director Dan Rhoton and GIS Director Luis Olivieri. Hopeworks does truly amazing work in Camden and I hope you enjoy learning about their organization. My interview with Dan and Luis is below.  

What is the Hopeworks organization and what do you do?

Hopeworks ‘N Camden is a nonprofit that has been working for over 15 years with Camden youth.  Utilizing an advanced training curriculum in website design/development, Geographic Information Systems (GIS) and Salesforce, Hopeworks works with youth 14-23 to get back in school and find a safe pathway to their future using the Sanctuary model!   Recently, the Greater Philadelphia Chamber of Commerce named Hopeworks ‘N Camden their 2016 Nonprofit Organization of the Year!

As part of it’s program, Hopeworks runs four businesses, offering a complete array of website design & development, along with online GIS mapping products, and a complete set of Salesforce implementation, optimization, training, and support offerings.  In addition, Hopeworks Youth Healing Team offers training and consultation to schools and nonprofits about working with youth and adults affected by trauma.  These businesses help to generate over 50 youth jobs a year.  

Can you tell us a little bit about the history of Hopeworks and how GIS became one of the focuses of your training and client work?

In 1999, members of three churches in north Camden, the most blighted section of the city, came together and organized the Hopeworks program. Our original concept was to offer training to high school dropouts. During 2000, we formed a Board of Directors, hired an Executive Director, wrote grant proposals, and got startup grants from the Campbell Soup Company and the Society of Jesus.  In March 2000, we opened our doors to the first trainees and started our non-profit website design business.

Early in our experience, we learned three vital things. First, we saw that our youth needed more than training; they also needed help in setting goals, making plans, and sticking to those plans. Second, given the weak academic backgrounds of our Camden youth and the advancing complexity of website design, we realized they would need college degrees to get jobs in website design or other high-tech fields. Third, we confirmed that in-school youth needed additional motivation to stay in school.  

Thanks to Matt Grove, a Univ of Pennsylvania Grad student who loved GIS, volunteered and Hopeworks expanded its tech expertise from Web into GIS early in 2003.  Our clients have included the City of Camden, the EPA, and many others.

In 2012, we made another important discovery.  Training just was not enough.  Our youth were successful, but many more dropped out, did not complete the training, or were not successful in internships.  Technical training was not enough.  We also needed to help our youth heal.  Hopeworks made the decision to become a certified Sanctuary organization.  What does this mean?  It means we have radically shifted our values/priorities/interactions to a “trauma” framework.  

Students doing fieldwork, data collection in wooded area.

What do you consider to be your greatest successes relating to the GIS work you do?

The fact that our youth are able to collect and process data faster, more accurately, and more economically than professional staff is powerful!

Where do you see your organization, and specifically, the GIS team heading in the next five years?

Hopeworks has been invited to grow both in Camden and replicate our model regionally.  The next few years are exciting!

We would like to have a GIS Program that is recognized as a good alternative to traditional consulting services in the area of GIS.  We want people to know that when they hire our GIS Program at Hopeworks they can expect a very professional level of service, at a very competitive cost.  

How can members of the greater Philadelphia tech or GIS community volunteer their time or get involved in your work?

We have so many ways to volunteer, mentor, and support our work!  We need help to grow!

Even though GIS has been around for years it’s still a technology that is developing.  Keeping up with everything is very hard.  We could use people that can help us with trainings or just by volunteering time to help the GIS Team.  

One example of how volunteers are helping is our GIS Advisory Board, a group of well known professionals in the area of GIS from academia, government and industry. The board provides advice, support, guidance and recommendations which are key components for the development of the GIS Program at Hopeworks ‘N Camden. Chalkboard with Hopes for Camden written on

If you’re interested in volunteering, get in touch with Hopeworks! Also check out their upcoming hackathon on April 16th!

Lastly, what is your dream for Camden?  

We dream of a Camden where our youth are starting their own tech entrepreneurial ventures, healing both their families and their communities!

Registration and Conference Grant Applications Now Open for FOSS4G-NA 2016!

logo

As you may already know, this year’s Free and Open Source Software for GeoSpatial North America Conference (FOSS4G-NA) is located in Raleigh, North Carolina and will be held from May 2-5th.  As the Community Chair of the conference, I’m really pleased to let everyone know that registration is now open!  It’s also not too late to submit a presentation proposal or vote on the proposals.

I’d also like to share some information about the conference’s scholarship program. We realize that traveling to and attending a conference can be expensive. Not everyone who uses or develops open source software has someone to pay their way for them or the means to pay for it themselves. To help, FOSS4G-NA 2016 will be offering scholarships/conference grants to some attendees in need.

What is included?

Conference grants may include one or more of the following:

  • complimentary full access conference passes
  • lodging in the conference hotel
  • In exceptional cases, we may subsidize airfare or other transportation costs.
Who is eligible?

Postcard of Raleigh NC, Photo from Boston Public Library Flickr

The travel grants are available to anyone, not just students, but their purpose is to help those attend who would not have otherwise been able to. If you have the means to attend on your own, please do not apply. Because we believe that a diverse community is a stronger community, we’ve dedicated at least 60% of our funds for women and minority attendees. We believe in working towards ending the underrepresentation of these people in the open source community.

How do I apply?
  • Apply starting on January 8th, 2016
  • Be sure to apply by January 22, 2016
  • Applicants will be notified of their status by end of February, 2016.

Visit this form to Apply NOW

We hope to see you in Raleigh, North Carolina!

Questions? Contact us at foss4g@locationtech.org

Using PostGIS to Summarize Philadelphia Inspections Data

Violations from Nat Henry's 2015 Summer of Maps Project for Legal Clinic for the Disabled

Violations from Nat Henry’s 2015 Summer of Maps Project for Legal Clinic for the Disabled

In August of 2015, Summer of Maps Fellow Nat Henry wrote a Python script to collect the entire universe of available Philadelphia License and Inspections Department data and hosted it on Github as part of a project for the Legal Clinic for the Disabled. There were more than 1 million inspection records in the database from January 2003 to August 1, 2014. The combined CSV text file is about 330mb in size, and balloons to more than 5 gigabytes when converted into a shapefile. A CSV that large will frequently crash Excel, and a million record file starts to slow desktop GIS noticeably (and also quickly fill up a hard drive with intermediate files). I wanted to try and use the spatial Postgres plugin, PostGIS to more efficiently clean up and query the data.

Getting the Data

You’ll first need to download the data from the Summer of Maps Github page specifically for Licenses and Inspections. After you unzip the data locally on your computer, you’ll see that it has been split into a 12-part file, and a 3-part file. I decided to use the 3-part file, and used Excel to copy-paste the data into one spreadsheet. You’ll now have one unwieldy 330mb CSV. I already had the Philadelphia Neighborhoods shapefile from Azavea loaded into my database. Now let’s get the inspections data into a PostGIS-enabled Postgres database.

Loading into PostGIS

In a previous blog post I described how to use QGIS to load spatial data into PostGIS databases. I followed that same method here, loading the CSV into QGIS and converting it to a shapefile, and then importing it into PostGIS with QGIS’ SPIT function. There are definitely other and better ways of doing this, but using QGIS is one of the most straightforward.

Cleaning the Table

If all goes well, you should now be able to use QGIS’ DB Manager utility, or PGAdmin to explore your database and begin writing SQL queries. First I dropped a number of unnecessary fields to cut down on clutter and reduce table size:

ALTER TABLE incidents_wgs84
DROP "ViolationI",
DROP "ViolationL",
DROP "CaseNumber",
DROP "LocationID",
DROP "CaseType",
DROP "CaseLocati",
DROP "ViolationD",
DROP "LastUpdate",
DROP "Resoluti_1",
DROP "StNum",
DROP "StDir",
DROP "StSuffix",
DROP "field_28"

Converting the Date Field

There are over a million records in the database, so I wanted to limit my queries to any inspections from 2014 and later. This turned out to be much more difficult than expected, mainly because the date is stored in a Unix format called, ‘Epoch’, where the dates are stored as milliseconds since 1970, not to mention the field had some extra characters that needed to be removed.

I removed the extra characters from the “AddedDate” field with the following two queries:

update incidents_wgs84 SET "AddedDate" = REPLACE("AddedDate", '/Date(','');
update incidents_wgs84 SET "AddedDate" = REPLACE("AddedDate", ')/','');

Now we should just be left with a giant string of numbers, still stored incorrectly. The next query will convert the string to a big integer (to store huge numbers), and then convert to a timestamp. It also uses a conditional regex statement I found on StackExchange to handle null value records so they don’t break the query.

I start out queries using a “WITH” statement. This is from a special syntax called Common Table Expressions (CTE) that allows you to write more procedural-style SQL queries. While less efficient, the queries are much easier to read and construct. You can learn more about CTEs at Craig Kerstiens’ blog.

WITH clean_table AS
(SELECT *, to_timestamp("AddedDate"::BIGINT/1000) FROM incidents_wgs84 WHERE "AddedDate" ~ E'^\\d+$')

This cleans up and converts the date field into an SQL variable called “clean_table” which I’ll continue to build on in this post.

This next query uses the “clean_table” and selects only those dates after December 31, 2013 to narrow down the number of records:

recent_violations AS
(SELECT * FROM clean_table WHERE to_timestamp > '2014-12-31')

When I put both queries together, it should look like this:

WITH clean_table AS
(SELECT *, to_timestamp("AddedDate"::BIGINT/1000) FROM incidents_wgs84 WHERE "AddedDate" ~ E'^\\d+$')

recent_violations AS
(SELECT * FROM clean_table WHERE to_timestamp > '2014-12-31')

Great, now we can build on those two queries to start actually asking questions about the data.

Summarizing the Data

One of the best parts about Postgis is that you can ask questions fairly quickly without any of the interstitial cruft that desktop GIS tends to accumulate. If you catch a mistake or want to change something in PostGIS you can just modify the code and run it again. Meanwhile, in desktop GIS you might create five different intermediate layers only to realize you made a mistake and have to repeat everything. Let’s build some Postgis queries to summarize the 2014-2015 L&I data.

Violations Per Neighborhood

This query performs a spatial join between the L&I violations and the Philadelphia neighborhoods shapefile. The

ST_Transform(recent_violations.the_geom),2272

nested inside the spatial intersect (ST_Intersect) ensures that the incidents are projected into the local Philadelphia state plane coordinate system, EPSG 2272, to match the neighborhoods shapefile. The full query is below:

WITH clean_table AS
(SELECT *, to_timestamp("AddedDate"::BIGINT/1000) 
FROM incidents_wgs84 WHERE "AddedDate" ~ E'^\\d+$'),

recent_violations AS
(SELECT * FROM clean_table WHERE to_timestamp > '2014-12-31')

SELECT hoods."MAPNAME", COUNT(recent_violations.gid)
FROM recent_violations
JOIN "Neighborhoods_Philadelphia" AS hoods
ON ST_Intersects(ST_Transform(recent_violations.the_geom,2272),hoods.the_geom)
GROUP BY hoods."MAPNAME"
ORDER BY COUNT(recent_violations.gid) DESC
Limit 10

And here’s the result (I copied the table and used the Tables Generator website to convert it to Markdown):

Neighborhood Violations
Strawberry Mansion 2223
Mayfair 2082
Wissinoming 1983
Cobbs Creek 1882
Oxford Circle 1869
Point Breeze 1830
Frankford 1764
West Oak Lane 1759
North Central 1737
Haddington 1655

Most Common Violation Per Neighborhood

It’s great to see the number of violations per neighborhood, but I wanted to dig a little bit deeper. For instance, what is the most frequent violation per neighborhood? While Postgres doesn’t have a built in mode (statistically most frequent) function built in, I approximated one with this helpful stackoverflow post. I continue to use the CTE query style, calling the spatial join I did in the previous step “AS spatial_join” to use the result later for the SELECT DISTINCT:

WITH clean_table AS
(SELECT *, to_timestamp("AddedDate"::BIGINT/1000) 
FROM incidents_wgs84 WHERE "AddedDate" ~ E'^\\d+$'),

recent_violations AS
(SELECT * FROM clean_table WHERE to_timestamp > '2014-12-31'),

spatial_join AS (
SELECT hoods."MAPNAME", COUNT(*) as violcount, "Violatio_1"
FROM recent_violations
JOIN "Neighborhoods_Philadelphia" AS hoods
ON ST_Intersects(ST_Transform(recent_violations.the_geom,2272),hoods.the_geom)
GROUP BY hoods."MAPNAME", "Violatio_1"
ORDER BY violcount DESC)

SELECT DISTINCT ON ("MAPNAME")
    spatial_join."MAPNAME", 
    spatial_join.violcount,
    spatial_join."Violatio_1"
FROM spatial_join
ORDER BY spatial_join."MAPNAME", spatial_join.violcount DESC, spatial_join."Violatio_1"

This query will return a table with every single neighborhood in Philadelphia that had recorded violations, with the count of and name of the most frequent violation. The following table is random selection from the most frequent violations per neighborhood. It appears that the most frequent violation per neighborhood across the city is usually a citation by the Community Life Improvement Program (CLIP), an anti-blight and community beautification effort by the City of Philadelphia.

Neighborhood Most Frequent Violation Count
Dickinson Narrows CLIP VIOLATION NOTICE 13
East Mount Airy CLIP VIOLATION NOTICE 102
Greenwich EXT A-VACANT LOT CLEAN/MAINTAI 51
Haverford North CLIP VIOLATION NOTICE 19
Industrial CLIP VIOLATION NOTICE 4
Nicetown LICENSE-VAC RES BLDG 19
Northern Liberties ANNUAL CERT FIRE ALARM 25
Packer Park EXT A-CLEAN WEEDS/PLANTS 2
Parkwood Manor CLIP VIOLATION NOTICE 88
Powelton FIRE PROT SYS MAINTENANCE 4

Most Dangerous Violations Per Neighborhood

A photograph of a derelict building

Luckily, the table also has a “Priority” field, where violations are classified in a few different categories. I modified my original spatial join query to add a WHERE clause that looks for the “IMMINENTLY DANGEROUS” priority:

WITH clean_table AS
(SELECT *, to_timestamp("AddedDate"::BIGINT/1000) 
FROM incidents_wgs84 WHERE "AddedDate" ~ E'^\\d+$'),

recent_violations AS
(SELECT * FROM clean_table WHERE to_timestamp > '2014-12-31')

SELECT hoods."MAPNAME", COUNT(*) as violcount
FROM recent_violations
JOIN "Neighborhoods_Philadelphia" AS hoods
ON ST_Intersects(ST_Transform(recent_violations.the_geom,2272),hoods.the_geom)
WHERE recent_violations."Priority" SIMILAR TO 'IMMINENTLY DANGEROUS'
GROUP BY hoods."MAPNAME" 
ORDER BY violcount DESC

The resulting table contains an entry for every single neighborhood in the city with at least one Imminently Dangerous Violation from 1/1/2014 to 8/1/2015. I used QGIS to export the query result as a layer, and then brought the resulting shapefile into CartoDB through the handy CartoDB QGIS Plugin. One major benefit of using a PostGIS database to perform these kinds of spatial queries is space savings: the file I uploaded to CartoDB was only 140kb, which is amazing considering it started out as a 5.2 gigabyte shapefile. Here is the map of Imminently Dangerous Violations per Neighborhood on CartoDB:

Conclusion

The License and Inspection data that Nat scraped is a great resource for civic hackers and data analysts. But at over 5 gigabytes as a shapefile, it can get unwieldy very quickly. Using PostGIS to perform some initial transformations and simple queries is a great way to get started digging deeper into the data without filling up a hard drive. Finally, Common Table Expressions are an excellent way of keeping your PostGIS queries legible. Try taking PostGIS for a spin with the License and Inspections data. Write me @geotyler on Twitter if you have any questions.

Summer of Maps 2016: Now Accepting Applications for Nonprofits

Summer of Maps logo

Applications are now open for Nonprofits seeking pro-bono GIS analysis through the Summer of Maps program.  Summer of Maps offers fellowships to student GIS analysts to perform geographic data analysis for nonprofit organizations.  The program matches nonprofit organizations that have spatial analysis and visualization needs with talented students of GIS analysis to implement projects over a three-month period during the summer.  Below is the timeline for the 2016 program:

    • Now – Feb 7: Nonprofit organizations can submit brief proposals for spatial analysis projects to Azavea
    • Feb 7 – Feb 25: Azavea program administrators review organizations and narrows the list to finalists
    • Feb 26 – Mar 15: Students submit applications including proposals to work on finalist projects
    • Mar 13 – April 11: Student candidate reviews and interviews
    • April 11: Successful Summer of Maps fellows are notified
    • May 1: Public announcement of fellows and organizations
    • June – August: Summer of Maps fellows work on spatial analysis projects
    • For the most up to date schedule, please consult the Summer of Maps site.

What benefits do nonprofit orgs receive?

    • Pro-bono services from a talented student GIS analyst to geographically analyze and visualize your data
    • Visualization of data in new ways and combination of data with other demographic and geographic data to draw new observations
    • High quality maps that can be used to make a case to funders or support new initiatives

What benefits do students receive?

    • Opportunity to work spatial analysis projects that support the social missions of a nonprofit organizations
    • Work directly with Azavea mentors to improve GIS skills
    • Receive a monthly stipend
    • Gain work experience implementing and managing a GIS project

If you are a nonprofit organization and have a project you would like to see implemented, please submit an application.  The deadline is Sunday Feb 7th, 2016 11pm EST.  Nonprofits can check out the finalist organization proposals from 2015 for inspiration.  Keep in mind that students will be selecting from the finalist projects so identifying a project that is interesting and engaging is key in having your project be selected.  If you are a student, stay tuned – applications will open Feb 19th, 2016.

Fellowship Sponsors

We’d like to expand Summer of Maps and we’re looking for sponsors.  If you are interested in sponsoring a fellow or a mentor, please be in touch.

Tools for Getting Data out of OpenStreetMap and into Desktop GIS

Since its inception, OpenStreetMap has crowdsourced the addition of millions of features of spatial data across the world. It’s become a fantastic resource for geographic reference data and it’s constantly being improved and updated. While it’s open source and the data is free, getting it into a usable format for analysis in desktop GIS, for example, can take a bit of effort.

Data Formats

If you’re a GIS Analyst, you’re probably used to working with file types like shapefiles, CSVs, and GeoJSON. However, if you’ve downloaded OpenStreetMap data before, you probably noticed it came with its own file extension: .OSM. The .OSM file format is XML that describes the data. Though it might seem frustrating that OSM has its own file format, it’s actually very useful for filtering through the vast amount of tags contained in OSM data. Sometimes you’ll see a .PBF file, which is the compressed .OSM file. Luckily, QGIS has built-in support for OSM files and will render the layers. Below, I’ll talk about some tools to get OSM data into QGIS and in a future blog I’ll provide a tutorial for how to export data from OSM and right into QGIS for desktop analysis.

Right in the web browser…

 OpenStreetMap

openstreetmap_export

You can get OpenStreetMap directly from the web browser while viewing the data. To do so, click the Export button above the window. The export tab will open on the left side of your screen with the bounding box populated with the map coordinates you are currently zoomed to. This tool seemed to fail just about every time I tried it, with the exception of very small bounding boxes the size of only a few city blocks. However, below the Export button they provide some other ways to get the data, which I’ll talk about later.

If you want to use an API…

Overpass API

The Overpass API is a read-only API that serves up selections of map data from OSM. Unlike the main OSM API, it’s optimized to retrieve data. The OpenStreetMap wikipedia can be a helpful resource for using it. The Overpass API also powers some of the other extraction tools. So if you can master using the API, you’re essentially covered.

If you want a LOT of data….

Geofabrik Downloads

Geofabrik, a software consulting firm that specializes in OSM data, offers free extracts by continent, updated weekly. This is great if you want everything for an entire continent, but they’re large files, and you can’t filter on specific tags or features.

Planet OSM

If one continent isn’t enough for you, this resource offers extracts of the entire OSM database updated weekly as XML and PBF. They also provide a changeset database, so you can identify just the features that changed, if you need to do so. Like the Geofabrik exports, it’s a lot of data and you can’t filter for specific tags or features.

If you want data for one city, metro or geographic area…

MapZen Metro Extracts

MapZen built a tool to export all OSM data at the metropolitan region level for dozens of cities around the world. Thankfully for us, they provide the data on their Metro Extracts page. The data is refreshed weekly and it’s provided in six different file formats. Anyone can contribute by creating a pull request or issue for another metro area.

BBBike.org

bbbike_export

This extensive extraction tool allows you to create a custom bounding box to download data, or manually input the geographic coordinates. One of the great features is the ability to get the data in dozens of file formats, including shapefile, Garmin and SVG.

HOT Exports

hot_export_tool_areas

The Humanitarian OpenStreetMap team put together this extract tool. You need an OSM account to use it, but once you have that, you’re good to go. The tool has an interactive map viewer to locate the geographic area you need then draw the bounding box. It also allows you to select which feature tags to download. This is an easy-to-use tool, but it’s only available in the countries HOT works.

If you’re working in QGIS…

Quick OSM

This is a plugin available in the QGIS plugins repository. Once installed, it has a user-friendly GUI complete with a query window. This is nice because you can construct a query to filter specific types of data in the OSM that you want to return, based on tags in the data or other elements that you deem important. Once you’ve selected the bounding geographic box, you then have the option to display only the tags present in the data in your selection, so you can narrow the data you want even further down beyond that. You can also select to filter on custom queries and based on points, lines or polygons. Once the data is selected, it populates an XML query that you can reuse later.

OSM Downloader

Another plugin for QGIS is OSM Downloader. This plugin adds a button to the QGIS toolbar that allows you to interactively select a bounding box, or define the bounding coordinates, and download an OSM file for that geographic area. You can save the layer or add it to the QGIS canvas. It’s a quick and simple way to easily get data, but you can’t filter or query the data, so you’ll get everything in the bounding box you define.

QGIS Vector Menu

OpenStreetMap support is also built directly into the QGIS application. To add data, you’ve got three options: Download Data, Import Topology from XML, and Export Topology to Spatialite. If you haven’t already downloaded any OSM data, you can pretty much work through these tools in order to get OSM data into your map. This tool will give you the option to select the extent of the data you’d like to download either by the map canvas (the geographic area you are currently zoomed to in the map), from an existing layer or manually by typing in the bounding box coordinates. Note that this will download and generate an osm file for everything within the extent — so keep in mind you may get a very large file! For Philadelphia, it took about a minute to process a 65mb file.

If you’re working in PostGIS…

Osm2pgsql

This is a command-line tool for importing OSM directly into a PostGIS-enabled PostgreSQL database.

Imposm

This is another command-line tool to import OSM-based PBF files into PostGIS, including the ability to update a database based on diff files. Currently in its third version, it’s written in Go. Previous versions are written in Python.

In Part 2 of this blog, I’ll go into a couple tools more in depth. Also, don’t forget to read and understand the copyright agreement and required citations when you are using OSM data.

OSM Data Extraction Tools
Name Geographies File Formats Software
Overpass API Any, though large files take time OSM XML, OSM JSON None required
Geofabrik Continents OSM, PBF Web-based
PlanetOSM Global OSM, PBF Web-based
MapZen Metro Extracts Metropolitan Areas OSM, .PBF, Osm2pgsql SHP, GeoJSON, IMPOSM SHOP, IMPOSP GEOJSON, Water and Coastline SHPs Web-based
BBBike.org Bounding box to define OSM, PBF, SHP, CSV, SVG, Garmin Web-based
HOT Exports Bounding box to define, HOT countries only SHP, Garmin, OSM, KMZ, SQlite Web-based
QuickOSM Any, though large files take time OSM QGIS
QSM Downloader Any, bounding box to define OSM QGIS
QGIS Vector Menu Any, bounding box or layer to define OSM, PBF, JSON, SHP QGIS
Osm2pgsql Any, though large files take time PostGIS PostgreSQL, PostGIS
Imposm Any, though large files take time PostGIS PostgreSQL, PostGIS