Thursday, May 20, 2021

Birthday Problem without Replacement

In one of the previous posts, we saw the generalization of Coupon collector's problem without replacement. Not long after, I thought about the same generalization for the Birthday problem.

Because a Deck of cards is a better platform for dealing with problems without replacement, we use the same setup here. That is, we seek the expected number cards we need to draw (without replacement) from a well shuffled pack of cards to get two cards of the same rank.

This problem is a lot easier. We have $n=13$ ranks each with $j=4$ instances (suits). Using Hypergeometric distribution, the probability that we need more than $k$ cards

$\displaystyle \mathbb{P}(X>k)=\frac{\binom{13}{k}\cdot 4^k}{\binom{52}{k}}$

That is, of the $k$ cards we select from $52$ cards, we can have atmost one instance for each rank. Therefore, the $k$ cards should come from the $13$ ranks and for each of the card, we have $4$ choices.

Now the expected value is obtained easily.

$\displaystyle \mathbb{E}(X)=\sum_{k=0}^n \mathbb{P}(X>k)=\sum_{k=0}^n \frac{\binom{n}{k}\cdot j^k}{\binom{nj}{k}}=1+\sum_{k=1}^nj^k\binom{n}{k}\binom{nj}{k}^{-1}$

Needless to say, this agrees with the classical expression (that is at $j \to \infty$) given in Pg 417 (or Pg 433) of 'Analytic Combinatorics'. We attempt to find asymptotics when $n\to\infty$. 

Let $\displaystyle a_k=j^k\binom{n}{k}\binom{nj}{k}^{-1}$ for $k\geq 1$

Then, using the beautiful idea I learned from Hirschhorn's page (for example, papers 176 and 197),

$\begin{align}\displaystyle a_k &= j^k\binom{n}{k}\binom{nj}{k}^{-1}\\ &= \frac{n}{nj}\frac{n-1}{nj-1}\cdots\frac{n-k+1}{nj-k+1}j^k \\ &= \frac{1-\frac{0}{n}}{1-\frac{0}{nj}}\frac{1-\frac{1}{n}}{1-\frac{1}{nj}}\cdots \frac{1-\frac{k-1}{n}}{1-\frac{k-1}{nj}} \\ &\approx \text{exp}\left\{-\frac{0}{n}-\frac{1}{n}-\cdots -\frac{k-1}{n}+\frac{0}{nj}+\frac{1}{nj}+\cdots +\frac{k-1}{nj}\right\} \\ &\approx \text{exp}\left\{-\frac{1-j^{-1}}{n}\frac{k^2}{2}\right\} \\ \end{align}$

This is nothing but Laplace's method widely discussed in literature and especially in my favorite book 'Analytic Combinatorics' (page 755).

Note that we have used idea that $(1+x)^m=e^{m\log(1+x)}\approx e^{mx}$ for small $x$. Now,

$\displaystyle \mathbb{E}(X)=1+\sum_{k=1}^n a_k \approx 1+\int\limits_0^\infty \text{exp}\left\{-\frac{1-j^{-1}}{n}\frac{k^2}{2}\right\}\,dk=1+\sqrt{\frac{\pi}{2}}\sqrt{\frac{n}{1-j^{-1}}}$

where we've used the standard Normal integral. For large $j$, this clearly reduces to the classical asymptotic value.

In fact, for large $n$, the asymptotic expansion of the binomial coefficient is

$\begin{align}\displaystyle \binom{n}{k} & \approx \frac{n^k}{k!}\text{exp}\left\{-\frac{S_1(k-1)}{n}-\frac{S_2(k-1)}{2n^2}-\frac{S_3(k-1)}{3n^3}\cdots\right\} \\ & \approx \frac{n^k}{k!}\text{exp}\left\{-\frac{k(k-1)}{2n}-\frac{k^3}{6n^2}-\frac{k^4}{12n^3}\cdots\right\}\\ & \approx \frac{n^k}{k!}\text{exp}\left\{-\frac{k^2}{2n}-\frac{k^3}{6n^2}-\frac{k^4}{12n^3}\cdots\right\}\\ \end{align}$

where $S_r(m)$ is the sum of the $r$-th powers of the first $m$ natural numbers.

Using the second expression and simplifying a bit more, we have

$\displaystyle a_k \approx \text{exp}\left\{\frac{1-j^{-1}}{8n}\right\}\text{exp}\left\{-\frac{(1-j^{-1})(k-\frac{1}{2})^2}{2n}\right\}\left(1-\frac{1-j^{-2}}{6n^2}k^3-\frac{1-j^{-3}}{12n^3}k^4+\frac{1}{2}\frac{(1-j^{-2})^2}{36n^4}k^6+\cdots\right)$

If we now use the Normal approximation, we get

$\begin{align}\displaystyle \mathbb{E}(X) &\approx 1+\text{exp}\left\{\frac{1-j^{-1}}{8n}\right\}\left(\sqrt{\frac{n\pi/2}{1-j^{-1}}}-\frac{1}{3}\frac{j+1}{j-1}-\frac{1}{4}\frac{1-j^{-3}}{(1-j^{-1})^{5/2}}\sqrt{\frac{\pi}{2n}}+\frac{5}{24}\frac{(1-j^{-2})^2}{(1-j^{-1})^{7/2}}\sqrt{\frac{\pi}{2n}}\right) \\ &= 1+\text{exp}\left\{\frac{1-j^{-1}}{8n}\right\}\left(\sqrt{\frac{n\pi/2}{1-j^{-1}}}-\frac{1}{3}\frac{j+1}{j-1}-\frac{1}{24}\frac{1-4j^{-1}+j^{-2}}{(1-j^{-1})^2}\sqrt{\frac{1-j^{-1}}{2n/\pi}} \right) \\ &\approx 1+\sqrt{\frac{n\pi/2}{1-j^{-1}}}-\frac{1}{3}\frac{j+1}{j-1}+\frac{1}{12}\frac{1-j^{-1}+j^{-2}}{(1-j^{-1})^2}\sqrt{\frac{1-j^{-1}}{2n/\pi}} \\ \end{align}$

which I think is an $O(n^{-1})$ approximation. Hope you liked the discussion.

Clear["Global`*"];
n = 10000; j = 7;
acc = 50;
res = N[1, acc];
k = 1; nume = N[n, acc]; deno = N[n, acc];
Monitor[Do[
     res += N[nume/deno, acc];
     nume *= (n - k); deno *= (n - k/j);
     , {k, n}];, {k, res}]; // AbsoluteTiming
res
N[1 + Exp[(1 - j^-1)/(
    8 n)] (Sqrt[(n \[Pi]/2)/(1 - j^-1)] - 1/3 (1 + j^-1)/(1 - j^-1) - 
     1/24 (1 - 4 j^-1 + j^-2)/(1 - j^-1)^2 Sqrt[(1 - j^-1)/(
      2 n/\[Pi])]), acc]


Until then
Yours Aye
Me

No comments:

Post a Comment