Articles by
Dave Felcan

Research: The Amazon Elastic Cloud

"I am exploring the use of The Elastic Compute Cloud (EC2) as a resource for some of Azavea projects already in use. DecisionTree, our geographic prioritization system, was an ideal first candidate..."

I am very excited about my Azavea research project on the Amazon Elastic Compute Cloud (Amazon EC2), a technology from Amazon Inc. that is shifting a lot of people’s ideas about what computing is and can do. Amazon EC2 has arisen through the confluence of technological innovations of the past few years.

First some background. One of the most basic pieces of infrastructure in the World Wide Web today is the ubiquitous entity known as “The Server”. This term is used for a computer which performs some task or tasks on behalf of other computers. For example, web pages come from a web “server”, which sends web pages to your computer for you to see. Moreover this web server may in turn query other servers to complete this request — contact a database server to get data or geospatial server to produce a map image for example.

The idea behind a computing “cloud” (and there are others — as referenced in Robert’s ‘What the Heck is…” article above) is a bunch of computers accessible from the internet which “instantiate” whole virtual computers — with all their associated operating systems, software, data, etc. — that can be accessed on demand. One can instantiate one of these machines, connect to it via the internet through standard remote connection protocols, and voila! your screen shows the desktop for this “computer” that behaves exactly as if it were sitting under your desk.

While for desktops, this approach is odd, for servers there can be many benefits. With a few clicks of a mouse, multiple copies of the same server can be up and running at the same time to handle increases in demand. They can be shut down again when not needed. The details and headaches of actually running and owning physical machinery are offloaded to the cloud provider. The cloud provider also provides bandwidth. Once you have a working version of a website, database, or geospatial server, it can be copied and reused — no need to start from scratch with configuration.

For my research project, I am exploring the use of The Elastic Compute Cloud (EC2) as a resource for some of Azavea projects already in use. DecisionTree, our geographic prioritization web system, was an ideal first candidate. This product requires strong computing resources and was designed from the ground up to be able to run on multiple computers. With EC2 we were able to run DecisionTree on 10 instances at once, dramatically speeding up its operations and providing a mechanism for running DecisionTree for customers who do not want to maintain their own server infrastructure.

In addition to DecisionTree, we are also experimenting with running our Cicero legislative and election data service on EC2 as well as other ways to leverage the Amazon Web Services. For example, last spring, we tested a map image ’tile cache’ service that will generate and store a set of map tiles, enabling an organization to reduce bandwidth usage and improve responsiveness of a high traffic web mapping application. While EC2 was originally limited to Linux-based software, the recent addition of Windows Server as a target platform has provided much more flexibility. Do you have ideas for how you could use Amazon Web Services for your GIS project? Let us know.

Philadelphia Police Department Makes Crime Mapping Application Available to the Public

"People like to know what goes on in their neighborhoods ... they want to know if any crimes have been committed nearby."

People like to know what goes on in their neighborhoods. Most of us want to know if a new family is moving in down the block, if a store is closing or a new business opening, and, perhaps more than these, we want to know if any crimes have been committed nearby. And when it comes to something as important as crime, we want that information from a credible source. While police departments across the country record this information, it is generally only used internally by police personnel. In recent years a relatively small number of city police departments have started making the data available to the public.

Philadelphia’s Police Department (PPD) is now one of these select police departments. In response to widespread public concerns about crime in the city, Mayor Nutter and Police Commissioner Ramsey charged the Police Department with creating a public website where city residents can map the incidence of major crimes in Philadelphia. Based on our previous work with crime analysis applications (such as Crime Spike Detector and PhiCAMS), the PPD selected Azavea to develop the system. Working closely with the Police Department and the Mayor’s Office of Information Services (MOIS), we were able to get the application up and running in just six weeks.

The emphasis of the site is on simple, accurate display of crime occurrence across the city in a “pin map” style. All crime data is fed nightly to the site directly from Philadelphia Police Department’s databases. Up to 30 days worth of crime can be viewed simultaneously and a data download feature enables anyone to extract and download the data for more rigorous analysis.

One of the greatest challenges in creating the site was the need to display even relatively high volume of crimes at every scale. For example, theft is the most numerous of the so-called “Part 1″ crimes (the more serious crimes). Viewing thefts city-wide, for a typical thirty-day period, may result in 3,000 or more data points. The map depicting this situation would simply be a mass of undifferentiated points, which is not useful to anyone.

To address this concern, the site uses a common cartographic technique of “aggregation” – taking many points concentrated in the same geography and lumping them together into a single larger point, sized proportionately to the number of points it represents. This is analogous to the size of points used to represent the population of cities in many atlases. The website computes these new aggregations “on-the-fly” depending on how close or far one has zoomed into the map. There are several techniques for accomplishing this type of task. We used the ‘K-means clustering‘ approach which is a method for finding the centers of natural clusters.

We are excited by this new initiative and hope the public will find it useful. Visit http://citymaps.phila.gov/crimemap to check out the application or your neighborhood.

OLAP: Online Analytic Processing

On Line Analytical Processing (OLAP) is a technology that extends conventional database technology by enabling rapid analysis of aggregated data. Like most information technology, OLAP comes with its own vocabulary. Whereas data in a traditional database is stored in two-dimensional tables, OLAP databases store data in multi-dimensional cubes that enable people to quickly change their view of aggregated data with less effort. The cube is made up of numeric facts called measures – like the ‘number of packages of widgets shipped to a client’. Measures are grouped into dimensions. Some typical dimensions might include time, product categories, delivery areas and so on.

OLAP cubes can be queried in a similar manner to a conventional database, but while most databases use Structured Query Language (SQL), their OLAP brethren have their own language, called MultiDimensional eXpressions (MDX). You wouldn’t want to use MDX to run your sales transaction database, but it’s ideally suited to create a report such as ‘Total Packages Delivered by Route by Product Source per Quarter for the last 5 years’.

The output of an MDX query can be represented in all of the traditional ways including tables and charts, but we are obviously interested in the geography and maps. While OLAP systems have been used in large businesses to analyze sales and other data for many years, their use with geographic data has been limited. Geospatial information has special properties that are not captured in most OLAP systems, such as proximity and cartographic hierarchies (like various zoom levels). The distribution of events in space and time has much to say about those events, and the spatial part of that equation is not yet incorporated fully in many of the tools on the market today. By incorporating these special properties into OLAP cubes, more powerful data analysis can be performed, revealing new and important patterns in information. My research seeks to bring spatial analysis into the OLAP world and broaden the power and applicability of this technology. I am particularly interested in real estate data and am working with several years of Vermont real estate sales.

SBIR Grant Award Announcement: HunchLab – Leveraging Spatial Statistics to Validate Human Intuition

As part of their daily activities, police officers often formulate hunches based on observations and other sources of information. Large amounts of crime data already exist in electronic form, so officers have been using information management systems and visualization tools to help sift through this data. Despite the availability of these tools, hunches remain difficult to confirm or deny.

We are pleased to announce that the National Science Foundation recently awarded Azavea a Phase I Small Business Innovation Research (SBIR) grant to design and evaluate ‘HunchLab’, a prototype system that will enable police officers to develop and evaluate hunches.

‘HunchLab’ was inspired by the Crime Spike Detector that Azavea developed to help the Philadelphia Police Department (PPD) identify when and where unusual increases in crime are occurring. The Crime Spike Detector, which has been in operation since June 2005, uses a spatial statistics algorithm developed in conjunction with Dr. Tony Smith (University of Pennsylvania) to compare current crime to historical crime across the city. Each night this ‘data mining’ service checks for spikes in different types of crime. Unusual increases result in an email being sent to the relevant district captain. The email details the severity of the spike and links to an online report with maps, charts and tables, enabling analysis of the result (learn more). Although ‘HunchLab’ will initially be developed to assist with crime detection, tools such as the Spike Detector and ‘HunchLab’ are applicable in any application where events display geographic changes in distribution, such as disease occurrence, consumer buying patterns and real estate sales.

‘HunchLab’ is supported by the Small Business Innovation Research program of the National Science Foundation, Directorate for Engineering, Division of Industrial Innovations and Partnerships, Award Number (IIP-0637589).

Cicero

The past ten years have seen an unprecedented increase in the number of non-governmental organizations that specialize in providing communities with a voice in local politics. Non-profits embark on a variety of campaigns that seek to give their members and the public information about local elected officials and a way to voice their opinions. Several of these organizations have realized that their correspondence campaigns lack effectiveness because the recipients do not know exactly which local elected official they should be contacting.

Upon the realization that the core concerns of these issues lie in geography, several non-profits turned to Azavea in search of a solution. In response, we are proud to present a new web service aimed at bridging the gap between political advocacy and local government. We call it Cicero, in honor of the legendary Roman orator of the 1st century BC. Cicero uses a “geocoding” service to reference an address in more than 30 cities nationwide, providing the inquiring user with pertinent information on who their local elected official is and how they might be in contact with him or her. We continue to add new cities every week. We invite you to check out Cicero for yourself at http://www.azavea.com/cicero.