Hypergeometric distribution


Story

Consider an urn with \(a\) white balls and \(b\) black balls. Draw \(N\) balls from this urn without replacement. The number white balls drawn, \(n\), is Hypergeometrically distributed.


Example

There are \(a+b\) finches on an island, and \(a\) of them are tagged (and therefore \(b\) of them are untagged). You capture \(N\) finches. The number of tagged finches \(n\) is Hypergeometrically distributed.


Parameters

There are three parameters: the number of draws \(N\), the number of white balls \(a\), and the number of black balls \(b\).


Support

The Hypergeometric distribution is supported on the set of integers between \(\mathrm{max}(0, N-b)\) and \(\mathrm{min}(N, a)\), inclusive.


Probability mass function

\[\begin{split}\begin{align} f(n; N, a, b) = \frac{\begin{pmatrix}a \\ n\end{pmatrix} \begin{pmatrix}b \\ N-n\end{pmatrix}}{\begin{pmatrix}a+b \\ N\end{pmatrix}}. \end{align}\end{split}\]

Moments

Mean: \(\displaystyle{N\,\frac{a}{a+b}}\)

Variance: \(\displaystyle{N\,\frac{ab}{(a + b)^2}\,\frac{a+b-N}{a+b-1}}\)


Usage

Package

Syntax

NumPy

rg.hypergeometric(a, b, N)

SciPy

scipy.stats.hypergeom(a+b, a, N)

Stan

hypergeometric(N, a, b)



Notes

  • This distribution is analogous to the Binomial distribution, except that the Binomial distribution describes draws from an urn with replacement. In the analogy, the Binomial parameter \(\theta\) is \(\theta = a/(a+b)\).

  • SciPy uses a different parametrization than NumPy and Stan. Let \(M = a+b\) be the total number of balls in the urn. Then, noting the order of the parameters, since this is what scipy.stats.hypergeom expects, the PMF may be written as

\[\begin{split}\begin{align} f(n;M,a,N) = \frac{\begin{pmatrix}a \\ n\end{pmatrix} \begin{pmatrix}M-a \\ N-n\end{pmatrix}}{\begin{pmatrix}M \\ N\end{pmatrix}}. \end{align}\end{split}\]
  • Although NumPy and Stan use the same parametrization, note the difference in the ordering of the arguments.


PMF and CDF plots