Azavea Atlas

Maps, geography and the web

The History and Future of Disease Analysis and Visualization

ODI

While recently attending the ODI Summit Discovery Day in London, I had the opportunity to hear from Max Van Kleek about the broad possibilities of data visualization. He shared many examples of effective infographics and visualizations. Max explained that data analysis and visualization is used most often to do one of two things: to solve problem, or to communicate complex idea in a simple manner.

In public health, these two purposes of data visualization are invaluable for epidemiologists to understand and track dangerous disease outbreaks. For example, in seeking to understand the Cholera outbreak in 1854 Soho neighborhood in London, physician John Snow (1813-1858) conducted a geographic survey of the locations of Cholera deaths. Through his analysis he found that the outbreak was not airborne, as commonly thought, but linked to a contaminated water source. Once he was able to use his analysis to convince the community of the source of the outbreak, the handle of the contaminated pump was removed, preventing further spread of the disease. The analysis John Snow conducted is commonly believed to be the first ever use of geographic analysis to understand and solve a complex problem. While in London at the Summit, I visited the site of the pump which was once the cause of hundreds of deaths, but now serves as a symbol of how data driven analysis helps us solve health problems and save lives.

JohnSnow

A modern version of John Snow’s Map of Cholera Deaths in Soho, London

waterpump

A replica of the water pump determined to be the source of the Cholera outbreak.

The second important purpose of data analysis and visualization in Public Health is to advocate and communicate effectively with a wide audience. Florence Nightingale (1820-1910) used patient data and a visualization of causes of mortality to prove that significantly more soldiers during the Crimean War were suffering from diseases related to contamination in hospitals than from actual war injuries. This advocacy helped to improve hospital conditions and encourage sanitary practices in medical facilities, which subsequently saved millions of lives. Nightingale’s effective data driven communication of this serious public health concern was integral in changing policy.

FN

Florence Nightingale’s visualization of cause of deaths during the Crimean War

APHA

Historical uses of public health mapping have paved the way for methods used by epidemiologists today. At a recent GeoPhilly Meetup about the confluence of spatial analysis and Public Health, local researchers shared how they use these tools in various public health applications. For instance, Joan Bloch, PhD, CRNP at Drexel shared her public health work using geographic tracking. By mapping the time and cost of transportation,  she was able to demonstrate the challenges low-income women face finding transportation to maternity-related care visits. This research is used to advocate for more thoughtful distribution of services to encourage better utilization of these key resources. Joan is presenting her work at the upcoming American Health Planning Association conference this month in New Orleans.

As analytical tools continue to evolve, spatial analysis and visualization will become even more valuable in understanding and tracking epidemic outbreaks like the current spread of Ebola in West Africa. Researchers are already charting the rate of disease spread geographically to determine how to best plan their health resources to limit future cases. With the increased use of GPS-enabled devices to record health data, the importance of using spatial analysis and visualization will become even more vital in fighting disease and promoting a healthier society.

Ebola tracking map created by the World Health Organization

Ebola tracking map created by the World Health Organization

 

2014 Geo Open Source Conference hosted by GeoPhilly and LocationTech is Coming!

2014 Geo Open Source

Azavea is pleased to announce our participation in the 2nd annual LocationTech Tour which will feature a stop in Philadelphia. Registration is now open for the 2014 Geo Open Source Conference on November 20th in Philadelphia at ph.ly/opengeo.

Azavea is now in its second year as members of the Eclipse Foundation and its working group, LocationTech.  We are glad to be joined by the likes of Boundless, IBM, Oracle, Google, and others.  This group functions as a thriving community for open source geospatial software to which we are proud to contribute.  The LocationTech Tour is a federated series of global events demonstrating and discussing new technologies and concepts with open source geospatial software and open data.

GeoPhilly, Philadelphia’s meetup group for map enthusiasts, is involved in the organization and presentation of this event.  Since its founding in fall 2013, the group has grown to over 375 members and has held 12 events.  Philadelphia’s LocaitonTech tour event, the 2014 Geo Open Source Conference presented by GeoPhilly and LocationTech will be held on Thursday, November 20th and feature a speaker series in the afternoon and a social event in the evening.

This year’s event will feature talks by:

Register

Additionally, a LocationTech Code Sprint will follow on Friday, November 21st to be held at Azavea’s office.  This event is a one day code sprint featuring opportunities to work on LocationTech project code with experts.  Join us to learn and network with industry experts in a friendly collaborative atmosphere with plenty of camaraderie.

Last year’s event in Philadelphia had a turnout of over 100 individuals interested in geospatial and open source technology.  You can find a more information about last year’s events as well as videos of presentations in this recap.  Our deep gratitude to the organizers, speakers, participants, and supporters who make the Tour a success. Founding supporters of the 2014 Tour include Azavea, Boundless, Mapzen, Oracle, and the Open Source Geospatial Foundation (OSGeo).

Mission Emission: Analyzing and Mapping CO2 Emissions

The People’s Climate March on September 20th brought over 300,000 people to the streets of New York City to voice support for policies that reduce the man-made effects of climate change across the globe. It couldn’t be any more timely that an international research team led by scientists at Arizona State University released the Fossil Fuel Data Assimilation System. It’s a global database of CO2 emission estimates at the 0.1 decimal degree resolution (about 8-10 kilometers in the continental U.S., depending on latitude) containing hourly and yearly data from 1997 to 2010. You can visualize the data by year here, and it’s available for download in a few different file formats, including text and csv. It’s quite an incredible database, and the first of its kind at that resolution. The CO2 is estimated using a combination of existing data sources such as population, remotely sensed nighttime lights and the location of power plants. The methodology can be found in Rayner et al. (2010) and Asefi-Najafabady et al. (2014).

ffdas_2010

My first question when discovering this dataset was “Where might co2 emissions have increased or decreased since 1997?”. My hunch was that overall emissions have increased, but that the spatial distribution might show some interesting trends across the United States. For example, the spatial distribution may have been affected by the fact that in 1997 the U.S. was in a period of strong economic growth, while in 2010 the country was still recovering from the Great Recession.

Data Processing

For this exercise, I converted the text files into polygons and sampled at the county level to get estimates of CO2 change between 1997 and 2010. Alternatively, the NetCDF files could be used to generate rasters for visualization of the data. QGIS has a NetCDF browser plugin for doing just that. Since the dataset is at the 0.1 degree resolution, it lends itself to creating a raster quite easily. Originally, I planned to vectorize the raster dataset to display it in CartoDB (as of now, CartoDB does not support raster files). However, I thought it would be more interesting and useful to aggregate the data to a more common unit of analysis that people would understand, such as counties.

There are a few ways to go about this. I’ll describe how to do this in ArcGIS using the ET GeoWizards plugin, but it’s also possible to do this analysis in QGIS. ET Geowizards is available as a free ArcGIS plugin. There’s also a paid version with some more advanced features that requires a license.

First, I had to convert the text files containing the coordinates and CO2 emission value into a vector dataset. Originally thinking I was going to create a raster visualization, I used the Point to Raster tool in ArcGIS to create a raster surface of the data, but later decided to calculate the data at the county level, and vectorized the raster data with the Raster to Polygon tool. This produced polygons of raster cells at the 0.1 degree resolution. The data itself is measured in Kilograms of Carbon (kgC) per square meter, so I transformed the data to total kgC, then converted to metric tons. Since the vector polygons are smaller than counties, I was able to resample the data using the Transfer Attributes tool in ET GeoWizards. This tool applied a proportion of the CO2 emissions total to each county polygon based on the proportion of the CO2 emissions polygons that overlapped each county. If the CO2 emissions polygon was entirely inside the county, the total amount was applied. Next, I summarized CO2 emissions by county. Quite conveniently, the Transfer Attributes tool will do all this.

If you don’t have access to ArcGIS, the entire attribute transfer process can be accomplished using the QGIS Intersect and Dissolve tools. The county and smaller CO2 emissions polygons can be intersected in QGIS, and the area of the resulting polygons can be divided by old polygon areas to get a ratio. That ratio can then be applied to the CO2 emissions polygons and summarized at the county level.

These processes work under the assumption that the value for the CO2 emissions polygons are a total amount of emissions for the entire area of the polygon. However, CO2 emissions are rarely uniformly distributed — therefore, using this coarse resolution CO2 at any geographic level smaller than a county is probably not appropriate.

Below you’ll find a map of estimated CO2 emissions change in percentage between 1997 and 2010 for every county in the United States, visualized in CartoDB.

Increase in Carbon Dioxide Emissions

In the Tableau visualization at the bottom of the page, you can also view the ten counties with the largest increase in CO2 emissions in metric tons, compared to their population change. Most notable are the increases in Collin and Denton County, Texas. Both counties are in suburban Dallas. Other counties are in similarly fast growing areas in California and Florida, along with Allegheny County, Pennsylvania (home to Pittsburgh). It makes sense that rapidly growing areas would see such an increase in CO2 emissions. Allegheny County is a different story, since it lost population during the data’s time frame. Overall, all major Texas metros saw significant increases, as did Florida — with the notable exception of Miami. Southern California, Las Vegas and Phoenix saw increases. There’s also a noticeable trend of increase across the central Midwest, from Illinois through Ohio.

Decrease in Carbon Dioxide Emissions

The counties with the largest decrease in carbon dioxide emissions are mostly in parts of the country that haven’t done too well economically over the past 20 years, such as the Detroit area. New England and most of the Northeast in general also saw a decrease, with the cities of Boston, Philadelphia and Baltimore. CO2 emissions decreased across the Northwest, including Portland and Seattle. Perhaps the most stark trend visible is the decrease across the Great Plains, surely related to the decrease in population in that part of the country.

Largest Counties in the U.S.

Of the ten largest counties in the U.S., most saw increases in CO2 emissions. The exceptions here being the two counties (boroughs) in New York (Kings and Queens) and Miami-Dade, Florida. Both King and Queens in New York City saw only small population increases, but Miami-Dade saw a rather significant increase in population. Cook County, Illinois was the only county in the top ten to see an increase in emissions but a decrease in population. We find that in Maricopa County, home to Phoenix, and Harris County, home to Houston, CO2 emissions percentage growth has outpaced population growth in the time period.

Further Study

There are certainly opportunities to try to better understand some of what the data is indicating. For example, we might expect areas with rapidly growing populations to experience an increase in CO2, but what about counties that declined during the time period, such as Allegheny County, PA? There’s also some interesting trends to try to better understand. This is a great start, but I hope to see a higher resolution version of this dataset released in the future.

You are Invited: Geography Week in Philadelphia

Geography Week

Geography Awareness Week, November 16th-22nd 2014, features activities and events all over the world related to geography and mapping.  Geography Awareness Week was originally established by National Geographic to promote to Geography in American education and to excite people about geography as both a discipline and a part of everyday life.

In Philadelphia, alone, there are numerous events relating to Geography and mapping during this special week.  Below is a round up!

Monday November 17th

Happy Hour Lecture with Carol Collier of the Academy of Natural Science

Tapping Our Watershed, formerly known as the Delaware River Watershed Initiative Seminar Series, will meet next on Monday, Nov. 17, to raise glasses with Carol Collier, the Academy’s senior adviser  for watershed management and policy.

Tuesday November 18th

Esri GeoDev Meetup: Northeast

This event is a social gathering for developers to discuss the latest in mapping, geo technology, geo services, web and mobile mapping apps, app design, cloud solutions, map data or anything else related to solving real-world “geo” problems.  Developers of all levels of expertise are welcome, from seasoned GIS professionals to those new to geospatial development.

Map Measure Manage: How city government uses place-based data for decision making & civic engagement

A showcase of apps developed by or for Philadelphia City government for data-driven decision-support using spatial technology (GIS). See how your city government uses data to make decisions and reach out to citizens.  Find out what data and tools are available to for civic projects.  Drop-in for a Q&A with City staff and see how these tools are helping to improve city services.

Wednesday November 19th

Penn GIS Day

Penn GIS Day, held in conjunction with the National GIS Day celebration, focuses on real-world applications and innovations stemming from uses of Geographic Information Systems. The forum examines the use of GIS both at Penn and more broadly, offering an opportunity for professional and academic interaction.

Thursday November 20th

2014 Geo Open Source Conference Hosted by LocationTech and Geophilly

Philadelphia’s LocationTech event will be a conference-style speaker series featuring technical talks on the convergence of open source and geospatial. A happy hour follows the conference.

Friday November 21st

LocationTech Code Sprint

This event is a code sprint hosted by Azavea. This event provides an opportunity to work on open source geospatial project code with experts. All projects are welcome.

 Sunday November 23rd

250 Miles Crossing Philadelphia – down the rabbithole

A hand, reaching for a gift wrapped in wrinkly tissue paper with an image of the street that lies on the opposite side of the location where this image can be seen – the facade of the Apple Storage Building on 52nd and Willows Avenue. For us this image is the rabbit-hole that leads us into the virtual world in which our project 250 Miles Crossing Philadelphia* takes place. We took the digital Street-view image from that exact location, printed it and made it into this new world again.

 

Take the opportunity during Geography Awareness Week to attend these events and explore how geography awareness enhances your daily life.

Chart Your Way to Visualization Success

Following up on the themes of Sarah’s earlier blog, “4 Cartography Color Tips Hue Should Know”, here are a few tips I picked up from DataWeek 2014 in San Francisco in September 2014:

Visualizations and infographics are a powerful way to communicate data. However, with great power comes great responsibility, so here are a few ways to make sure they turn out clean, beautiful, and well-suited for their purpose: to be shared with the public.

Use The Cycle of Visual Analysis

Tableau guru Mike Klaczynkski defined the cycle of visual analysis as a six-step process that’s applicable across a broad range of data analysis:

  1. Define the question

  2. Get data to answer the questions

  3. Structure and clean the data

  4. Visualize and explore the data

  5. Develop insights

  6. Share the results

Simple enough, right? The last step, however, is a doozy. If you’ve gone through the trouble of steps 1-5 and then don’t share the results clearly, it could unravel all that hard work. As a data professionals you should provide legible visualizations to share your results with your intended audience.

Check Your Charts Before You Wreck Your Charts

When producing a visualization, do what Dave Fowler from Chart.io recommends and ask yourself: Am I trying to impress people with how cool this looks? Or am I trying to share my results clearly? If you’re more concerned about bells and whistles on your visualizations, you’ll end up graphics from a 1997 clipart nightmare instead of a powerful way to stream your message. Use the eight steps below and chart a voyage away from the rocky shores of bad decisions:

  1. Make your visualization audience-appropriate. You might not use the same chart to explain something to your dad as you would for your fellow data analysts. You might if he were also a data analyst.

  2. Make a graphic appropriate to the data (e.g. don’t make a time series for something with no time component). This site has a great breakdown on what kinds of charts to use for what kinds of data.

  3. Make sure it’s not a pie chart (people can understand square area better than they can circular areas). Read Death to Pie Charts to learn more and also get a bunch of great visualization tips.

  4. If you’re making a map, make sure it’s not just showing population density, as pointed out in this excellent example from webcomic XKCD (which has been linked before in a previous Atlas Blog about bicycle and pedestrian crashes in Philadelphia by Daniel McGlone). Sometimes you can get around it by normalizing the data by population.

  5. Avoid skeuomorphism in your charts, or trying to make an object look like the thing it represents. While there’s still some debate about whether websites and apps all need be stop being skeuomorphic, but there’s no question that pseudo-3D charts with photos of bananas on them need to go:

  1. Ask yourself if you’re showing the data clearly.

  2. See if someone unfamiliar with the results can interpret it.

  3. Show it off to everyone!

Chart Your Journey to Better Visualization

There are a ton of resources out there to make sure that your visualizations look good and get your message across. Get a head start by checking out the beautiful infographics blog, Information is Beautiful, thumbing through books by legendary visualization experts Edward Tufte or Stephen Few, or trying your hand at a cornucopia of data visualization tools at Datavisualization.ch:

Remember, the point of any visualization, whether it’s a chart, graph, or a map, is to communicate data to an audience in a meaningful format.