Hypergeometric Distribution
The Hypergeometric Distribution is useful in situations where sampling from a heterogeneous and finite population is performed without replacement. It describes the probability distribution of a hypergeometric random variable, defined as the number of successes resulting from a statistical experiment wherein a sample size of n is selected without replacement from a population of N items in which k items are considered successes and N-k items are considered failures.
You might think of this as a situation where you have a bag of 100 balls: 90 white and 10 red. Suppose we select 5 balls at random without looking. What are our chances of getting 0 red balls? 1 red ball? 2 red balls, etc., up to 5? This is answered via the Hypergeometric Distribution. In our example, you have a 58% likelihood of getting no red balls and a 34% chance of selecting 1 red ball. This means that cumulatively you have a 92% chance of picking 1 ball or fewer or, conversely, only an 8% chance of picking 2 red balls or more. Unlike in a binomial experiment, where there is replacement and thus a constant probability of success with each selection, in a hypergeometric experiment the probability of success changes with each succeeding selection because the composition of population N is changing with each selection.
Within HunchLab, the balls in our bag correspond to crimes in the current time period (red) and in the historical comparison period (white). Taking a sample from this population allows us to compare the observed number of crimes in a particular catchment for the current period to the number of crimes that we would expect to see based on the Hypergeometric Distribution. Fisher's Exact Test allows us to assign a precise probability to our observed value and thus to classify an observation as a spike.