Saturday, January 31, 2026

Blending the Birthday Problem and the Coupon Collector Problem

The Birthday Problem and Coupon Collector problem are well known problems in probability that are like duals of each other. While the former deals with 'first collision', the latter deals with the 'complete collection'. I believe they are too famous even warrant further introduction here.

In this post, we attempt to solve a blend of these two problems. We start with the following question: If we are collecting coupons from among $n$ different types of coupons, what is the expected numbers of coupons that are needed to complete the collection given that the first collision occurred at the $j$'th coupon.

Let $K$ be the random variable indicating the number of coupons required to complete the collection and $C$ be the random variable indicated the number of coupons for the first collision. Then, we are looking for $\mathbb{E}(K|C=j)$.

This is not very hard. First, we already have collected $j$ coupons and because the first collision occurred at the $j$'th coupon, we have '$j-1$' distinct coupons. From here, we could proceed just like in the case of the original coupon collector problem. Therefore, we have,

$\begin{align}\mathbb{E}(K|C=j)&=j+\frac{n}{n-(j-1)}+\frac{n}{n-(j-2)}+\cdots+\frac{n}{1}\\&=j+n\text{ }H_{n-j+1} \\\end{align}$

where $H_n$ is the $n$'th Harmonic number. Note that the formula is valid for $1<j\leq n$.

We now deal with the (more challenging) 'reverse' question. That is, given we have completed the collection in $K=j$ coupons, how many cards were average would the first collision have occurred. That is, we look for $\mathbb{E}(C|K=j)$.

After trying out different approaches, I settled on the following approach. Let's first count the number of valid sequences. Let $N(n,j)$ be the number of sequences which collects $n$ different coupons exactly on the $j$'th draw. Obviously, $N(n,j)=0$ for $j<n$ and $N(n,n)=n!$.

With some thought, we could see that

$N(n,j)=n\text{ }N(n-1,j-1)+(n-1)\text{ }N(n,j-1)$

This works because, to collect $n$ exactly at the $j$'th draw, we could've (1) collected $n-1$ exactly at the $j-1$'th draw and the final coupon (for which we have $n$ choices) becomes the new coupon to complete the set (or) taken a sequence which completed the set of $n$ coupons in $j-1$ draws, choose any of the $n-1$ coupons except the last and insert it the penultimate position.

The $N$ function also has 'direct' formula via Stirling numbers of the second kind but we prefer this recurrence. We'll see why in a moment. Now, the expected value we are after can be written as,

$\displaystyle\mathbb{E}(C|K=j)=\frac{T(n,j)}{N(n,j)}$

where $T(n,j)$ is the sum of the first-collision-index of all the valid $N(n,j)$ sequences.

We don't have to look further to solve $T(n,j)$. Note that the neither of the two operations we described above for deriving the recurrence of $N(n,j)$ affects the first-collision-index of that sequence because the last coupon cannot be part of first collision (by definition, it has to be a new coupon to complete the set). Therefore, the same recurrence works for $T(n,j)$ as well!

The only tricky part is the boundary. Consider $T(n,n)$ which is the sum of all first collision indices where the set of $n$ coupons is collected in exactly $n$ draws. But those sequences must contain all distinct elements and therefore cannot have first collision. However, staying true to the process, we attribute a first collision index of $n$ to these sequences as we have collected $n$ coupons and the process have concluded. Therefore, we define, $T(n,n)=n \cdot n!$.

For example, using these recurrences, I get $\displaystyle\mathbb{E}(C|K=8,n=5)=612/175\approx 3.497$.

Hope you enjoyed the discussion.


Until then
Yours Aye
Me