Thursday, May 20, 2021

Birthday Problem without Replacement

In one of the previous posts, we saw the generalization of Coupon collector's problem without replacement. Not long after, I thought about the same generalization for the Birthday problem.

Because a Deck of cards is a better platform for dealing with problems without replacement, we use the same setup here. That is, we seek the expected number cards we need to draw (without replacement) from a well shuffled pack of cards to get two cards of the same rank.

This problem is a lot easier. We have $n=13$ ranks each with $j=4$ instances (suits). Using Hypergeometric distribution, the probability that we need more than $k$ cards

$\displaystyle \mathbb{P}(X>k)=\frac{\binom{13}{k}\cdot 4^k}{\binom{52}{k}}$

That is, of the $k$ cards we select from $52$ cards, we can have atmost one instance for each rank. Therefore, the $k$ cards should come from the $13$ ranks and for each of the card, we have $4$ choices.

Now the expected value is obtained easily.

$\displaystyle \mathbb{E}(X)=\sum_{k=0}^n \mathbb{P}(X>k)=\sum_{k=0}^n \frac{\binom{n}{k}\cdot j^k}{\binom{nj}{k}}=1+\sum_{k=1}^nj^k\binom{n}{k}\binom{nj}{k}^{-1}$

Needless to say, this agrees with the classical expression (that is at $j \to \infty$) given in Pg 417 (or Pg 433) of 'Analytic Combinatorics'. We attempt to find asymptotics when $n\to\infty$. 

Let $\displaystyle a_k=j^k\binom{n}{k}\binom{nj}{k}^{-1}$ for $k\geq 1$

Then, using the beautiful idea I learned from Hirschhorn's page (for example, papers 176 and 197),

$\begin{align}\displaystyle a_k &= j^k\binom{n}{k}\binom{nj}{k}^{-1}\\ &= \frac{n}{nj}\frac{n-1}{nj-1}\cdots\frac{n-k+1}{nj-k+1}j^k \\ &= \frac{1-\frac{0}{n}}{1-\frac{0}{nj}}\frac{1-\frac{1}{n}}{1-\frac{1}{nj}}\cdots \frac{1-\frac{k-1}{n}}{1-\frac{k-1}{nj}} \\ &\approx \text{exp}\left\{-\frac{0}{n}-\frac{1}{n}-\cdots -\frac{k-1}{n}+\frac{0}{nj}+\frac{1}{nj}+\cdots +\frac{k-1}{nj}\right\} \\ &\approx \text{exp}\left\{-\frac{1-j^{-1}}{n}\frac{k^2}{2}\right\} \\ \end{align}$

This is nothing but Laplace's method widely discussed in literature and especially in my favorite book 'Analytic Combinatorics' (page 755).

Note that we have used idea that $(1+x)^m=e^{m\log(1+x)}\approx e^{mx}$ for small $x$. Now,

$\displaystyle \mathbb{E}(X)=1+\sum_{k=1}^n a_k \approx 1+\int\limits_0^\infty \text{exp}\left\{-\frac{1-j^{-1}}{n}\frac{k^2}{2}\right\}\,dk=1+\sqrt{\frac{\pi}{2}}\sqrt{\frac{n}{1-j^{-1}}}$

where we've used the standard Normal integral. For large $j$, this clearly reduces to the classical asymptotic value.

In fact, for large $n$, the asymptotic expansion of the binomial coefficient is

$\begin{align}\displaystyle \binom{n}{k} & \approx \frac{n^k}{k!}\text{exp}\left\{-\frac{S_1(k-1)}{n}-\frac{S_2(k-1)}{2n^2}-\frac{S_3(k-1)}{3n^3}\cdots\right\} \\ & \approx \frac{n^k}{k!}\text{exp}\left\{-\frac{k(k-1)}{2n}-\frac{k^3}{6n^2}-\frac{k^4}{12n^3}\cdots\right\}\\ & \approx \frac{n^k}{k!}\text{exp}\left\{-\frac{k^2}{2n}-\frac{k^3}{6n^2}-\frac{k^4}{12n^3}\cdots\right\}\\ \end{align}$

where $S_r(m)$ is the sum of the $r$-th powers of the first $m$ natural numbers.

Using the second expression and simplifying a bit more, we have

$\displaystyle a_k \approx \text{exp}\left\{\frac{1-j^{-1}}{8n}\right\}\text{exp}\left\{-\frac{(1-j^{-1})(k-\frac{1}{2})^2}{2n}\right\}\left(1-\frac{1-j^{-2}}{6n^2}k^3-\frac{1-j^{-3}}{12n^3}k^4+\frac{1}{2}\frac{(1-j^{-2})^2}{36n^4}k^6+\cdots\right)$

If we now use the Normal approximation, we get

$\begin{align}\displaystyle \mathbb{E}(X) &\approx 1+\text{exp}\left\{\frac{1-j^{-1}}{8n}\right\}\left(\sqrt{\frac{n\pi/2}{1-j^{-1}}}-\frac{1}{3}\frac{j+1}{j-1}-\frac{1}{4}\frac{1-j^{-3}}{(1-j^{-1})^{5/2}}\sqrt{\frac{\pi}{2n}}+\frac{5}{24}\frac{(1-j^{-2})^2}{(1-j^{-1})^{7/2}}\sqrt{\frac{\pi}{2n}}\right) \\ &= 1+\text{exp}\left\{\frac{1-j^{-1}}{8n}\right\}\left(\sqrt{\frac{n\pi/2}{1-j^{-1}}}-\frac{1}{3}\frac{j+1}{j-1}-\frac{1}{24}\frac{1-4j^{-1}+j^{-2}}{(1-j^{-1})^2}\sqrt{\frac{1-j^{-1}}{2n/\pi}} \right) \\ &\approx 1+\sqrt{\frac{n\pi/2}{1-j^{-1}}}-\frac{1}{3}\frac{j+1}{j-1}+\frac{1}{12}\frac{1-j^{-1}+j^{-2}}{(1-j^{-1})^2}\sqrt{\frac{1-j^{-1}}{2n/\pi}} \\ \end{align}$

which I think is an $O(n^{-1})$ approximation. Hope you liked the discussion.

Clear["Global`*"];
n = 10000; j = 7;
acc = 50;
res = N[1, acc];
k = 1; nume = N[n, acc]; deno = N[n, acc];
Monitor[Do[
     res += N[nume/deno, acc];
     nume *= (n - k); deno *= (n - k/j);
     , {k, n}];, {k, res}]; // AbsoluteTiming
res
N[1 + Exp[(1 - j^-1)/(
    8 n)] (Sqrt[(n \[Pi]/2)/(1 - j^-1)] - 1/3 (1 + j^-1)/(1 - j^-1) - 
     1/24 (1 - 4 j^-1 + j^-2)/(1 - j^-1)^2 Sqrt[(1 - j^-1)/(
      2 n/\[Pi])]), acc]


Until then
Yours Aye
Me

Friday, May 14, 2021

Coupon collector's Problem without Replacement

In this post, we seek the expected number of coupons needed to complete the set where the number of coupons in each type is finite and we sample without replacement.

I recently got interested in this question when I saw a post on Reddit that asks for the expected number of cards that has to be drawn without replacement from deck of cards to get all the four suits. This is essentially the coupon collector's problem with 4 different coupon types with 13 coupons available in each type.

Let's first discuss a simpler question. What is the expected number of cards to be drawn without replacement from a well shuffled deck to collect the first Ace? This is exactly like that of the Geometric distribution but without replacement.

We can employ symmetry to simplify this problem. The four Aces would divide the remaining 48 cards into five equal sets. Therefore each set would contain 48/5=9.6 cards. Thus, including the first Ace we collect, we need 10.6 draws without replacement.

What we've discussed here is the Negative Hypergeometric distribution which deals with the expected number of draws without replacement needed to get $r$-th success. Analogous to the Geometric distribution, let's also define the Negative Hypergeometric distribution with $r=1$ as the Negative Geometric distribution.

If $X$ is a Negative Geometric random variable denoting the number of draws without replacement needed to get the first success, then based on our discussion above, we have

$\displaystyle\mathbb{E}(X)=\frac{N-K}{K+1}+1=\frac{N+1}{K+1}$

where $N$ is the population size and $K$ is the number of "successes" in the population.

That is all we need to solve our original problem. Even though, finding the expected number of draws without replacement to get all suits is solved a lot of times in Stack Exchange (For example, here, here and here by Marko Reidel with Generating functions), we are going to use the amazing Maximums-Minimums Identity approach used here.

Let $X_1$, $X_2$, $X_3$ and $X_4$ be the random number of draws without replacement needed to get the first Spades, Clubs, Hearts and Diamonds respectively. Then the random number $X=\text{max}(X_1,X_2,X_3,X_4)$ denotes the number of draws to get all the four suits.

Note that each $X_i$ is a Negative geometric variable and the minimum of any bunch is again Negative Geometric with the number of successes pooled together. Using the Max-Min Identity and the linearity of expectations, we have

$\begin{align}\displaystyle\mathbb{E}[X]&=\mathbb{E}[\text{max }X_i]\\ &=\sum_i \mathbb{E}[X_i] - \sum_{i<j}\mathbb{E}[\text{min}(X_i,X_j)]+\sum_{i<j<k}\mathbb{E}[\text{min}(X_i,X_j,X_k)]-\cdots\\ &= \binom{4}{1}\frac{52+1}{13+1}-\binom{4}{2}\frac{52+1}{26+1}+\binom{4}{3}\frac{52+1}{39+1}-\binom{4}{4}\frac{52+1}{52+1}\\ &= \frac{4829}{630} \approx 7.66508\end{align}$

Though we have solved the question in the case where each type has an equal number of coupons, it should be easy to see that this approach is generalizable easily.

For the case of $n$ different coupon types with $j$ coupons in each type, we have,

$\displaystyle \mathbb{E}(X)=\sum_{k=1}^n(-1)^{k-1}\binom{n}{k}\frac{nj+1}{kj+1}=(nj+1)\left[1-\binom{n+1/j}{n}^{-1}\right]$

which is the closed form solution of the finite version of the coupon collector's problem. We know that as $j\to \infty$, this reduces to the classical coupon collector problem. For case of $n\to \infty$, using the Asymptotic bounds of Binomial Coefficient, we can write,

$\displaystyle \mathbb{E}(X) \approx nj \left[1-\binom{n+1/j}{n}^{-1} \right] \approx nj - \Gamma(1/j)n^{1-1/j}$

Even though I knew how the Max-Min Identity is useful in terms of the coupon collector problem with replacement, it took me almost a day of on-and-off thinking before I could convince myself that we can make it work for the without replacement case as well. And it was a pleasant surprise that we could arrive at a nice asymptotic expression for the expectation.


Until Then
Yours Aye
Me