Wednesday, August 18, 2021

Birds on a Wire - a Probability puzzle

This post is about birds on a wire - a nice probability puzzle that I recently saw on cut-the-knot website. The site also gives five solutions all of which are somewhat informal.

While all the solutions were pretty, I was particularly struck by the elegant solution by Stuart Anderson. Studying his solution made me realize that the condition that '$n$ is large' is not very significant. In fact, we'll use his idea to solve the question for the finite version of the problem.

Let the length of the wire be $1$ unit. Let $X_k$ ($1\leq k \leq n$) denote the position of the $k$th bird to sit on the wire. That is, all the $X_k$'s are independent standard uniform random variables.

Let $X_{(k)}$ be the random variable denoting the $k$th Order statistic. Now let's define new random variables $Y_k$ for $0 < k \leq n+1$ such that

$Y_k=X_{(k)}-X_{(k-1)}$ with $X_{(0)}=0$ and $X_{(n+1)}=1$.

Note that the $n$ birds divide the wire into $n+1$ segments and the random variable $Y_k$ denote the length of the $k$th segment.

Using the joint distribution of the Order statistics of the Uniform distribution, we can see that the $Y_k$s are identical (but not necessarily independent) $Beta(1,n)$ variables.

Therefore, the cumulative distribution for any $Y_k$ is $F(y)=\mathbb{P}(Y_k \leq y)=1-(1-y)^n$ and the expected value is $\mathbb{E}(Y_k)=1/(n+1)$.

Now if $n$ were large, we can use the idea that the above expression for CDF is approximated by $1-e^{-ny}$ and the expected value by $1/n$. But as noted earlier, we don't have to assume $n$ is large and proceed directly.

Let's define new variables $C_k$ such that

$\displaystyle C_k=\begin{cases}0, & \text{if }k \text{th segment is not colored}\\  Y_k, & \text{otherwise} \end{cases}$

It is straight forward to see that

(i) $\mathbb{E}(C_1)=\mathbb{E}(C_{n+1})=0$ because those segments never get coloured and
(ii) $\mathbb{E}(C_2)=\mathbb{E}(Y_2)=\mathbb{E}(C_n)=\mathbb{E}(Y_n)=1/(n+1)$ because those segments always gets coloured.

For the cases $2<k<n$, consider the segment $Y_k$ along with its neighbours $Y_{k-1}$ and $Y_{k+1}$. A minute's thought shows that the $k$th segment is not coloured if and only if it is the biggest among the three variables.

If we let $Z_k=\max(Y_{k-1},Y_k,Y_{k+1})$, we can rewrite $C_k$ as

$\displaystyle C_k=\begin{cases}0, & Y_k=Z_k\\  Y_k, & \text{otherwise} \end{cases}$

But $\displaystyle \mathbb{P}(Y_k=Z_k)=\frac{1}{3}$ by symmetry. Therefore,

$\displaystyle \mathbb{E}(C_k)=\mathbb{P}(Y_k=Z_k)\cdot 0+\mathbb{P}(Y_k \ne Z_k) \mathbb{E}(Y_k|Y_k \ne Z_k)=\frac{2}{3}\mathbb{E}(Y_k|Y_k \ne Z_k)$

To do this, we condition on $Y_k$. This gives,

$\displaystyle \mathbb{E}(Y_k)=\frac{1}{3}\mathbb{E}(Y_k|Y_k=Z_k)+\frac{2}{3}\mathbb{E}(Y_k|Y_k \ne Z_k)$

Rearranging this, we have

$\displaystyle \mathbb{E}(C_k)=\frac{2}{3}\mathbb{E}(Y_k|Y_k \ne Z_k)=\mathbb{E}(Y_k)-\frac{1}{3}\mathbb{E}(Y_k|Y_k=Z_k)=\mathbb{E}(Y_k)-\frac{1}{3}\mathbb{E}(Z_k)$

To compute the expected value of $Z_k$, we proceed in two parts.

First, consider any distribution with CDF $F(v)$ and the order statistics $V_{(i)}$, $V_{(j)}$ and $V_{(k)}$ with $i<j<k$.

It can be seen intuitively that the conditional distribution of $V_{(j)}$ given $V_{(i)}$ and $V_{(k)}$ is the same as the $(j-i)$th order statistic obtained from a sample of size $k-i-1$ whose distribution is $F(v)$ truncated on the left at $V_{(i)}$ and on the right at $V_{(k)}$.

For example, if we know that $V_{(5)}=0.4$ and $V_{(10)}=0.9$ from the standard uniform distribution $U(0,1)$, then $V_{(7)}$ has the same distribution as $W_{(2)}$ from a sample of size $4(=10-5+1)$ from the distribution $W\sim U(0.4,0.9)$.

This result can be obtained by using Theorems 2 and 3 given here.

Second, it is a well known result in probability (cuttheknothere and here to quote a few) that if a line of length 1 unit is broken at $n-1$ points giving $n$ segments, then the expected length of the $k$th largest segment is $(1/k+\cdots+1/n)/n$. Specifically, the length of the largest segment in a 3 segment case is $11/18$.

Therefore,

$\displaystyle \mathbb{E}(Z_k)=\mathbb{E}(\mathbb{E}(Z_k|X_{(k-2)},X_{(k+1)}))=\mathbb{E}\left(\frac{11}{18}(X_{(k+1)}-X_{(k-2)})\right)=\frac{11}{18}\frac{3}{n+1}$

Using the above and substituting the known values, we get

$\displaystyle \mathbb{E}(C_k)=\frac{1}{n+1}-\frac{1}{3}\frac{11}{18}\frac{3}{n+1}=\frac{7}{18n+18}$

If we now let $C=C_1+C_2+\cdots+C_n+C_{n+1}$, then using the linearity of expectations, we get

$\displaystyle \mathbb{E}(C)=\frac{2}{n+1}+(n-3)\frac{7}{18n+18} =\frac{7n+15}{18n+18}$

Note that this applies only when $n\geq 3$ (cases $n<3$ are trivial to calculate). More importantly, this shows that the expected value tends to $7/18$ for large $n$.

I would like to thank Abhilash Unnikrishan for pointing out the error (I assumed that $Y_k$s are independent) in my original solution and to use the 'expected value of largest segment in a line' to simplify calculations in arrive at the final result. 

Hope you enjoyed the discussion.


Until then
Yours Aye
Me