Wednesday, March 23, 2022

Notes on the Hypergeometric Distribution and its Negative

I wanted to create a post about the Hypergeometric distribution, one of the most important distributions in probability theory, as a prelude to my next post.

The probability mass function of a Hypergeometric distribution with parameters $N$, $K$ and $n$ is given by

$\displaystyle f(k;N,K,n)=\mathbb{P}(X=k)=\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$

The distribution has several symmetries which will be the key content of this post.

To start with, the role of successes and failures could be swapped. In terms of probability, this means that $f(k;N,K,n)=f(n-k;N,N-K,n)$. The binomial coefficients in the numerator of $f(k;N,K,n)$ gets swapped with this transformation.

In the next setup, the role of drawn and not drawn elements could be swapped. In terms of probability, this gives $f(k;N,K,n)=f(K-k;N,K,N-n)$. This amounts to invoking the symmetry of the binomial coefficient $^nC_k=^nC_{n-k}$.

Finally, the role of drawn elements and successes could be swapped as well. In probability terms, this becomes $f(k;N,K,n)=f(k;N,n,K)$. In other words,

$\displaystyle \mathbb{P}(X=k)=\binom{n}{k}\frac{\binom{N-n}{K-k}}{\binom{N}{K}}$

I find this pretty for two reasons. Firstly, there are two 'nCk' terms with the lowercase pair in the numerator and the uppercase in the denominator which gives it a nice structure. Secondly, and most importantly, this brings out the parallel with the Binomial distribution.

Comparing the probability of $k$ successes which we have above with that of a $\text{Bin}(n,k)$, we have,

$\displaystyle \frac{\binom{N-n}{K-k}}{\binom{N}{K}} \approx \left(\frac{K}{N}\right)^k\left(1-\frac{K}{N}\right)^{n-k}$

where $N \gg n$ and ratio of $K$ and $N$ is finite. This can be seen as an asymptotic expression for the ratio of two binomial coefficients.

To make better use of this asymptotic expression, let $K=pN$ and $p+q=1$. Then, the above can be rewritten as

$\displaystyle p^iq^j \approx \frac{\binom{N-i-j}{K-i}}{\binom{N}{K}}$

Lastly, all the above symmetries can be brought under a single umbrella by using beautiful idea from Duality and Symmetry in the Hypergeometric distribution. Surprisingly, this idea is relatively unknown. The authors show that the probability associated with the Hypergeometric distribution can also be written as,

$\displaystyle \mathbb{P}(X=k)=\binom{N}{k,n-k,K-k}\bigg/ \binom{N}{n}\binom{N}{K}$

where the numerator should be understood as the multinomial coefficient.

Finally, we come to visit the lesser known relative of the Hypergeometric distribution - the Negative Hypergeometric distribution. Just like the Hypergeometric parallels the Binomial, the Neg. Hypergeometric parallels the Neg. Binomial distribution.

We could use the exact idea of Neg. Binomial to derive the probability expression for Neg. Hypergeometric. But considering what we've seen so far, we could do better.

Let $Y$ be a Neg. Binomial variable that counts number of failures encountered until the $r$th success and $W$ be the corresponding Neg. Hypergeometric variable. We know that

$\displaystyle \mathbb{P}(Y=k)=\binom{k+r-1}{k}p^rq^k$

Now we just have to make use of the asymptotic expression above to derive the same for the Neg. Hypergeometric case. We have,

$\displaystyle \mathbb{P}(W=k)=\binom{k+r-1}{k}\frac{\binom{N-k-r}{K-r}}{\binom{N}{K}}$

Neat, isn't it?


Until then
Yours Aye
Me