About Hunchlab

Catchment

To detect statistical aberrations indicative of a spike in crime, HunchLab tests user-defined search patterns by breaking the entire search area (such as a city) into statistical catchment basins. An individual catchment is a circle defined by a given radius around a point. An evenly-spaced lattice of points is imposed such that the entire search area is completely covered by catchments.

Potential crime spikes are identified by comparing a particular category of crime data for a recent period to a historical comparison period -- e.g. the past 30 days as compared to the previous 365 -- for each catchment, thereby tying statistical aberrations to a specific place and giving police useful information to work with. The detection of statistical patterns is sensitive to the definition of catchment size, so HunchLab includes a number of tools to help users define Hunches.

Data Mining

Also known as 'knowledge discovery', this is the practice of analyzing large quantities of data with the goal of identifying meaningful patterns. Law enforcement agencies collect far more data than personnel are able to analyze manually. The effective deployment of resources demands that this process be automated through algorithms that use statistical analysis to interrogate massive quantities of data and summarize the results in the form of actionable information. HunchLab combines the power of automated data mining with the specialized knowledge of law enforcement officers through the creation of user-defined Hunches.

Threshold Hunch

A Threhold Hunch allows users to define a geographic area and timeframe of interest and to set a raw number as the threshold to trigger an alert (unlike alerts generated through Statistical Hunches). An example might be "Send me an alert when 5 or more aggravated assaults occur in the 7th district in one week". Similarly, analysts can set the parameters for a Mass Threshold Hunch, defining a large spatial area as being of interest. This area is automatically partitioned into catchments and an alert is triggered when data are reported that fulfill the threshold criteria in any of these catchments. Like Statistical Hunches,  Threshold Hunches are evaluated on a schedule set by users.

Fisher's Exact Test

Fisher's Exact Test is a test of statistical significance used to determine if there are nonrandom associations between categorical variables. It is typically performed under conditions where there are two independent variables that fall into one of two mutually exclusive categories. The test is conducted by constructing a 2-by-2 contingency table and summing the probabilities (p-values) for cases where the distribution of values is least as extreme as in the observed case, in the same direction and where the marginal totals are fixed (same total number for each variable and category). Under these conditions, the null hypothesis of independence results in a distribution of data values that conforms to the hypergeometric distribution. A case where the p-value is extremely small indicates a nonrandom association and in the context of HunchLab, a spike in crime.

Statistical Hunches

HunchLab enables law enforcement officials to evaluate their intuitions about changes in crime patterns through the creation of Statistical Hunches. To define a Statistical Hunch, users are required to specify a few essential criteria:

  • Crime class(es)
  • Geographic search area
  • Current time period
  • Historical or comparison time period
  • To test the validity of a Statistical Hunch, HunchLab uses the hypergeometric algorithm to compare incident data in the specified crime class(es) and geographic area from the current period to the same data types in the historic period. Hunches that are statistically confirmed trigger an automated reporting system that alerts appropriate law enforcement personnel.

    For more detail about how Hunches are created and managed, check out the Features page.

Hypergeometric Distribution

The Hypergeometric Distribution is useful in situations where sampling from a heterogeneous and finite population is performed without replacement. It describes the probability distribution of a hypergeometric random variable, defined as the number of successes resulting from a statistical experiment wherein a sample size of n is selected without replacement from a population of N items in which k items are considered successes and N-k items are considered failures.

You might think of this as a situation where you have a bag of 100 balls: 90 white and 10 red. Suppose we select 5 balls at random without looking. What are our chances of getting 0 red balls? 1 red ball? 2 red balls, etc., up to 5? This is answered via the Hypergeometric Distribution. In our example, you have a 58% likelihood of getting no red balls and a 34% chance of selecting 1 red ball. This means that cumulatively you have a 92% chance of picking 1 ball or fewer or, conversely, only an 8% chance of picking 2 red balls or more. Unlike in a binomial experiment, where there is replacement and thus a constant probability of success with each selection, in a hypergeometric experiment the probability of success changes with each succeeding selection because the composition of population N is changing with each selection.

Within HunchLab, the balls in our bag correspond to crimes in the current time period (red) and in the historical comparison period (white). Taking a sample from this population allows us to compare the observed number of crimes in a particular catchment for the current period to the number of crimes that we would expect to see based on the Hypergeometric Distribution. Fisher's Exact Test allows us to assign a precise probability to our observed value and thus to classify an observation as a spike.

Records Management System and Computer Aided Dispatch

A Records Management System (RMS) is a computerized repository for police records, such as crime incident reports, arrest data, names and addresses, and property and evidence data. A key function of an RMS is compiling information based on Uniform Crime Reporting (UCR) and National Incident-Based Reporting System (NIBRS) codes, which can be used as parameters in Hunches. Computer Aided Dispatch (CAD) is a system that assists 911 operators and dispatch personnel in handling and prioritizing calls and in communicating with emergency services personnel. HunchLab maintains a secure environment because it works by integrating your existing data on your infrastructure, meaning that sensitive RMS and CAD data never leave your network.

RSS

RSS, an acronym for Really Simple Syndication, is a standardized, XML-based Web feed format used to publish content that is updated frequently, such as blog posts and news headlines. RSS feeds are accessed through a feed reader, or aggregator, which automatically checks for and downloads new content from users' subscribed feeds and provides an interface through which to manage and read feeds. GeoRSS extends the capabilities of RSS by defining a standard for encoding location information. GeoRSS is able to encode location data for points, lines and boundaries (boxes or polygons), and even for coordinate system information. With a GeoRSS viewer, GeoRSS feeds can be visualized as features on a map. Within HunchLab, RSS and GeoRSS feeds function as a "passive" alerting system (as opposed to the "active" alerting system of e-mail messages) that can be checked by users at any time.

Skinning

Skinning is a process that allows the look and feel of a custom software application to be adapted to the needs of end users. Skinning preserves the essential software functions, while reconfiguring the appearance of buttons and menus or the placement of elements within the software environment. HunchLab has been designed so that its interface can be easily adapted to the needs and preferences of clients without affecting underlying functionality.

Spatial Statistics

Spatial statistical analysis, or spatial statistics, is used to determine whether or not observed spatial data are typical or unexpected relative to a statistical model. HunchLab identifies aberrations in the spatial patterns of crime data by testing Statistical Hunches: comparing observed incidents of specific crime types at particular locations and time periods to the number of incidents that would be considered typical based on historical comparison data. If measures of statistical probability indicate that the number of observed observations is unexpected or abnormal, HunchLab's automated reporting systems are triggered.

Web Map Service

The Web Map Service Interface specification is a standard from the Open Geospatial Consortium (OGC). A Web Map Service (WMS) sets up a "language" or protocol whereby map images can be requested from a piece of software that specializes in the production of map images. Mapping websites (such as HunchLab) formulate these requests and receive map "tiles" (small pieces of the map) in response; these pieces are stitched together and shown to a user as a complete map. All prominent mapping software companies and organizations make software which understand the WMS protocol - ESRI, GeoServer, MapServer, etc. By using this protocol, HunchLab is able to work with whatever mapping solution clients prefer with no modifications required to the HunchLab code.

HunchLab uses the WMS standard to generate maps.

Spatial Filters

Spatial Filters are a means of determining whether or not particular geographies are included in the search pattern for a Hunch. HunchLab permits users to define these areas interactively: by drawing a polygon on a navigable map, geocoding an address and specifying a buffer around the address, or by clicking on the map and placing a buffer around the point. These features provide additional means for law enforcement officers' specialized knowledge to be incorporated into HunchLab's automated data mining process.

Copyright © 2008 - 2010, Azavea (formerly Avencia). All rights reserved.