In this post, we attempt to solve a blend of these two problems. We start with the following question: If we are collecting coupons from among $n$ different types of coupons, what is the expected numbers of coupons that are needed to complete the collection given that the first collision occurred at the $j$'th coupon.
Let $K$ be the random variable indicating the number of coupons required to complete the collection and $C$ be the random variable indicated the number of coupons for the first collision. Then, we are looking for $\mathbb{E}(K|C=j)$.
This is not very hard. First, we already have collected $j$ coupons and because the first collision occurred at the $j$'th coupon, we have '$j-1$' distinct coupons. From here, we could proceed just like in the case of the original coupon collector problem. Therefore, we have,
$\begin{align}\mathbb{E}(K|C=j)&=j+\frac{n}{n-(j-1)}+\frac{n}{n-(j-2)}+\cdots+\frac{n}{1}\\&=j+n\text{ }H_{n-j+1} \\\end{align}$
where $H_n$ is the $n$'th Harmonic number. Note that the formula is valid for $1<j\leq n$.
We now deal with the (more challenging) 'reverse' question. That is, given we have completed the collection in $K=j$ coupons, how many cards were average would the first collision have occurred. That is, we look for $\mathbb{E}(C|K=j)$.
After trying out different approaches, I settled on the following approach. Let's first count the number of valid sequences. Let $N(n,j)$ be the number of sequences which collects $n$ different coupons exactly on the $j$'th draw. Obviously, $N(n,j)=0$ for $j<n$ and $N(n,n)=n!$.
With some thought, we could see that
$N(n,j)=n\text{ }N(n-1,j-1)+(n-1)\text{ }N(n,j-1)$
This works because, to collect $n$ exactly at the $j$'th draw, we could've (1) collected $n-1$ exactly at the $j-1$'th draw and the final coupon (for which we have $n$ choices) becomes the new coupon to complete the set (or) taken a sequence which completed the set of $n$ coupons in $j-1$ draws, choose any of the $n-1$ coupons except the last and insert it the penultimate position.
The $N$ function also has 'direct' formula via Stirling numbers of the second kind but we prefer this recurrence. We'll see why in a moment. Now, the expected value we are after can be written as,
$\displaystyle\mathbb{E}(C|K=j)=\frac{T(n,j)}{N(n,j)}$
where $T(n,j)$ is the sum of the first-collision-index of all the valid $N(n,j)$ sequences.
We don't have to look further to solve $T(n,j)$. Note that the neither of the two operations we described above for deriving the recurrence of $N(n,j)$ affects the first-collision-index of that sequence because the last coupon cannot be part of first collision (by definition, it has to be a new coupon to complete the set). Therefore, the same recurrence works for $T(n,j)$ as well!
The only tricky part is the boundary. Consider $T(n,n)$ which is the sum of all first collision indices where the set of $n$ coupons is collected in exactly $n$ draws. But those sequences must contain all distinct elements and therefore cannot have first collision. However, staying true to the process, we attribute a first collision index of $n$ to these sequences as we have collected $n$ coupons and the process have concluded. Therefore, we define, $T(n,n)=n \cdot n!$.
For example, using these recurrences, I get $\displaystyle\mathbb{E}(C|K=8,n=5)=612/175\approx 3.497$.
The following python code can be used to check the validity of the formula.
import itertools
coups = 3
comps = 5
types = range(coups)
def first_collision(seq):
seen = set()
for i, x in enumerate(seq):
if x in seen:
return i + 1
seen.add(x)
return None
collision_times = []
for seq in itertools.product(types, repeat=comps):
seen = set()
t = None
ok = True
for i, x in enumerate(seq):
if x in seen and t is None:
t = i + 1
seen.add(x)
if len(seen) == coups:
if i != comps - 1: # must first complete at draw comps
ok = False
break
if ok and len(seen) == coups:
collision_times.append(t)
# outputs
print("Number of valid sequences:", len(collision_times))
print("Sum of first collision indices:", sum(collision_times))
print("Expected first collision time:", sum(collision_times) / len(collision_times))
Hope you enjoyed the discussion.
Until then
Yours Aye
Me
No comments:
Post a Comment