The Birthday problem is one of the classic problems in Probability theory concerning the probability that, in a set of $n$ randomly chosen people, some pair will have a the same birthday. The idea in this post is to generalize this setup and find the expected value in this generalization.

The generalization here is this: Assume that there is a very long line of persons ready to enter a very large room one by one. Each person is let in and declares her birthday upon entering the room. How many people, on average, must enter in order to find $k$ people that have birthdays that are $d$ days apart (given an $r$-day year)?

Well, when we say 'approximation', we are immediately reminded of the Poisson approximation of the Birthday problem in Wikipedia. But the idea here is that most of the approximations I found in the web concerns only about the probability and not about the expected value. A special case of this problem is solved in Strategic Practice 5 of the (amazing) Stat 110 course by the (extraordinary) Prof. Joseph Blitzstein. And I got curious as to how good the approximation really is.

Let $\mathbb{E}(k,d,r)$ be the expected number of people that have to enter room such that we have $k$ people with birthdays $d$ days apart for the first time.

Let $X_n$ be the number of $k$-set birthday matches in when $n$ people have entered the room. By the Poisson paradigm, $X_n$ can be considered to be a poisson variable. The parameter $\lambda_n$ for this variable is given by,

$\displaystyle\lambda_n=\displaystyle\binom{n}{k}\cdot\frac{1}{(k-1)!}\left(\frac{2d+1}{r}\right)^{k-1}$ (why?)

Therefore we immediately have, $\mathbb{P}(X_n=0)=e^{-\lambda_n}$

Now by a simple trick common in evaluating expected values, we have,

$\begin{align}

\displaystyle\mathbb{E}(k,d,r)&=k+\sum_{n=k}^\infty\mathbb{P}(X_n=0)=k+\displaystyle\sum_{n=k}^\infty e^{-\lambda_n}

\end{align}$

Though we have an infinite sum, the numbers decrease rapidly because of the exponential factor. For the values given the said Project euler problem, this approximation gives $\mathbb{E}_{\text{Poisson}}(3,1,10)\approx 6.15092$ and $\mathbb{E}_{\text{Poisson}}(3,7,100)\approx 8.79751$

But we can take it a little further. When the parameter of a Poisson variable get very large, then the Normal variable with the same mean and variance as the Poisson variable gives better accuracy. Therefore,

$\begin{align}

\displaystyle\mathbb{E}(k,d,r)&=k+\sum_{n=k}^\infty\mathbb{P}(X_n=0)\\

&=k+\displaystyle\sum_{n=k}^\infty \mathbb{P}(-1/2 \leq X_n \leq 1/2)

\end{align}$

where $X_n \sim \mathcal{N}(\lambda_n,\lambda_n)$ along with the continuity correction.

Here again, the numbers decrease fast and the sum converges in about fifty terms. Now we have $\mathbb{E}_{\text{Normal}}(3,1,10)\approx 5.68554$ and $\mathbb{E}_{\text{Normal}}(3,7,100)\approx 8.04791$

In fact for the required answer, the normal approximation gives $\mathbb{E}_{\text{Normal}}(4,7,365)\approx 33.6396$ with a relative error of just about $2.5\%$. Not bad.

These values were calculated with the following Mathematica code.

Clear["Global`*"];

PoissonApprox[k_, d_, r_] :=

k + Sum[N[

Exp[-Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/

Factorial[k - 1]]], {n, k, 100}];

NormalApprox[k_, d_, r_] :=

k + Sum[N[

CDF[NormalDistribution[

Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/Factorial[k - 1],

Sqrt[Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/

Factorial[k - 1]]], 1/2] -

CDF[NormalDistribution[

Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/Factorial[k - 1],

Sqrt[Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/

Factorial[k - 1]]], -1/2]], {n, k, 100}]

PoissonApprox[4, 7, 365]

NormalApprox[4, 7, 365]

Hope you enjoyed this. See you again with a nice problem.

Until then

Yours Aye

Me

The generalization here is this: Assume that there is a very long line of persons ready to enter a very large room one by one. Each person is let in and declares her birthday upon entering the room. How many people, on average, must enter in order to find $k$ people that have birthdays that are $d$ days apart (given an $r$-day year)?

**Non-Spoiler**: Some readers may recognize that this Problem 584 of Project Euler. But note that this post neither gives the answer nor the right way to solve that problem. All we are going here is find an approximation of the required value which is far off from the correct answer.Well, when we say 'approximation', we are immediately reminded of the Poisson approximation of the Birthday problem in Wikipedia. But the idea here is that most of the approximations I found in the web concerns only about the probability and not about the expected value. A special case of this problem is solved in Strategic Practice 5 of the (amazing) Stat 110 course by the (extraordinary) Prof. Joseph Blitzstein. And I got curious as to how good the approximation really is.

Let $\mathbb{E}(k,d,r)$ be the expected number of people that have to enter room such that we have $k$ people with birthdays $d$ days apart for the first time.

Let $X_n$ be the number of $k$-set birthday matches in when $n$ people have entered the room. By the Poisson paradigm, $X_n$ can be considered to be a poisson variable. The parameter $\lambda_n$ for this variable is given by,

$\displaystyle\lambda_n=\displaystyle\binom{n}{k}\cdot\frac{1}{(k-1)!}\left(\frac{2d+1}{r}\right)^{k-1}$ (why?)

Therefore we immediately have, $\mathbb{P}(X_n=0)=e^{-\lambda_n}$

Now by a simple trick common in evaluating expected values, we have,

$\begin{align}

\displaystyle\mathbb{E}(k,d,r)&=k+\sum_{n=k}^\infty\mathbb{P}(X_n=0)=k+\displaystyle\sum_{n=k}^\infty e^{-\lambda_n}

\end{align}$

Though we have an infinite sum, the numbers decrease rapidly because of the exponential factor. For the values given the said Project euler problem, this approximation gives $\mathbb{E}_{\text{Poisson}}(3,1,10)\approx 6.15092$ and $\mathbb{E}_{\text{Poisson}}(3,7,100)\approx 8.79751$

But we can take it a little further. When the parameter of a Poisson variable get very large, then the Normal variable with the same mean and variance as the Poisson variable gives better accuracy. Therefore,

$\begin{align}

\displaystyle\mathbb{E}(k,d,r)&=k+\sum_{n=k}^\infty\mathbb{P}(X_n=0)\\

&=k+\displaystyle\sum_{n=k}^\infty \mathbb{P}(-1/2 \leq X_n \leq 1/2)

\end{align}$

where $X_n \sim \mathcal{N}(\lambda_n,\lambda_n)$ along with the continuity correction.

Here again, the numbers decrease fast and the sum converges in about fifty terms. Now we have $\mathbb{E}_{\text{Normal}}(3,1,10)\approx 5.68554$ and $\mathbb{E}_{\text{Normal}}(3,7,100)\approx 8.04791$

In fact for the required answer, the normal approximation gives $\mathbb{E}_{\text{Normal}}(4,7,365)\approx 33.6396$ with a relative error of just about $2.5\%$. Not bad.

These values were calculated with the following Mathematica code.

Clear["Global`*"];

PoissonApprox[k_, d_, r_] :=

k + Sum[N[

Exp[-Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/

Factorial[k - 1]]], {n, k, 100}];

NormalApprox[k_, d_, r_] :=

k + Sum[N[

CDF[NormalDistribution[

Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/Factorial[k - 1],

Sqrt[Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/

Factorial[k - 1]]], 1/2] -

CDF[NormalDistribution[

Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/Factorial[k - 1],

Sqrt[Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/

Factorial[k - 1]]], -1/2]], {n, k, 100}]

PoissonApprox[4, 7, 365]

NormalApprox[4, 7, 365]

Hope you enjoyed this. See you again with a nice problem.

Until then

Yours Aye

Me