Sunday, September 25, 2022

Sunrise Problem - With Replacement

The famous Laplace Sunrise problem asks for the probability of the sun rising tomorrow given that it has risen for the past $n$ days. In this post, we generalize this problem to a setup where the population size is finite and how it changes the required probability.

Mathematically, we are looking for the posterior distribution of the unknown parameter $p$ such that

$X|p \sim \text{Bin}(n,p)$ and $p \sim \text{Beta}(\alpha,\beta)$

It is then well known that the posterior distribution is

$p|X \sim \text{Beta}(\alpha + X, \beta + n - X)$

Therefore, if we start with a uniform prior for $p$ so that $p \sim \text{Beta}(1,1)$ and given that $X=n$, then $p|X=n \sim \text{Beta}(1+n,1)$. Then

$\displaystyle \mathbb{E}(p|X)=\frac{n+1}{n+2}$

We now consider the following problem: Let's say we have an Urn containing $N$ balls such that some are white and rest are black. Let's say we draw $n_1$ balls from the Urn without replacement and observe the number of white balls. We then take $n_2$ balls from the Urn. Now, what is the probability distribution of the number of white balls in the second sample?

Casting the given information in mathematical terms, we have

$X_1|K \sim \text{HG}(N,K,n_1)$
$X_2|K-X_1 \sim \text{HG}(N-n_1,K-X_1,n_2)$ and
$K \sim \text{Beta-Bin}(N,\alpha,\beta)$

where the Beta-Binomial distribution is the discrete analog of the Beta distribution and has the discrete uniform distribution as the special case when $\alpha=\beta=1$. Note that we are looking for $X_2|X_1$.


$K-X_1|X_1 \sim \text{Beta-Bin}(N-n_1,\alpha+X_1,\beta+n_1-X_1)$

We now take a detour into a slightly different problem.

Let $Y|X \sim \text{HG}(N,K,n)$ and $K \sim \text{Bin}(N,p)$. What would be the unconditional distribution of $Y$?

We can get into an algebraic mess of expression to solve this but let's do it with a story proof. Let's say we have $N$ marbles each of which is painted either green (with probability $p$) or red and put into a bag. If we now draw $n$ marbles from the bag without replacement, what is the probability distribution of the green marbles?

With some thinking, it should be easy to easy that this is exactly what we have interms of $X$ (number of marbles painted green) and $Y$ (number of green marbles drawn) above. I posted this on reddit of which one particular reasoning was way better than mine which I give here.

If we don't know $X$, it is as good as saying that the colored marbles are wrapped in an opaque cover before being put in the bag. Therefore, the drawn marbles are no different from each other and unwrapping them after being drawn, we would find a marble to be green with probability $p$ and red otherwise. Therefore,

$Y \sim \text{Bin}(n,p)$

Notice that $N$ disappears and contributes nothing to the unconditional distribution. This result holds even if we replace the $\text{Bin}$ distribution with a $\text{Beta-Bin}$ distribution.

We now get back to our original problem. Knowing $X_2|K-X_1$ and $K-X_1|X_1$, we can immediately see that $X_2|X_1$ is Beta-binomially distributed. Therefore,

$X_2|X_1 \sim \text{Beta-Bin}(n_2,\alpha+X_1,\beta+n_1-X_1)$

If, for example, we started with an Urn of $N$ balls (in which some are white and some are black) and drew $n$ balls without replacement and found all of them to be white, then

$X_2|X_1=n \sim \text{Beta-Bin}(n_2,1+n,1)$

If the size of the second sample is 1, then this clearly shows that

$\displaystyle \mathbb{E}(X_2|X_1)=\frac{n+1}{n+2}$

Very surprisingly, this is exactly the same answer we would get even if we had made the draws with replacement (which is exactly the Sunrise problem)!!

We have so far generalized the Birthday problem, the Coupon collector problem and the problem of points in our blog and in every case, the 'finiteness' of the population alters the final result in one way or the other. Even in this case, at the onset, I doubt that very few would feel that the probability would remain unchanged in the finite case.

Until then
Yours Aye
Me

Expected number of cards drawing two Aces in a row

Let's say we shuffle a standard 52-card deck and draw cards one by one looking for.consecutive cards that form a pair (two cards with the same rank).

I was interested in this question after seeing this StackExchange post. Though the post does not offer an exact solution, the approximate solution given there was really interesting. When I solved the question finding the exact solution, I was really surprised by how good the approximation is.

In this post, we will solving a similar question both exactly and approximately. We are interested in the expected number of draws without replacement before drawing two Aces in a row.

Just to be clear, there may be cases we exhaust all the cards without finding two Aces in a row. In these cases, the number of cards drawn will obviously be 52.

Let $N$ be the random variable indicating the number of draws and we want $\mathbb{E}(N)$.

To solve this in exact form, we first need the probability of drawing $j$ Aces when drawing $k$ cards from the deck such that the Aces are not together. Let's denote this event by $T_{j,k}$. Then it is easy to see that,

$\displaystyle\mathbb{P}(T_{j,k})=\mathbb{P}(j\text{ Aces in }k\text{ draws})\mathbb{P}(\text{Aces are not consecutive})$

The first term is textbook Hypergeometric. For the second term, we lay the $k-j$ non-Ace cards giving us $k-j+1$ spaces between them. We simply have to $j$ spaces out of these for the Aces. Therefore,

$\displaystyle\mathbb{P}(T_{j,k})=\frac{\binom{4}{j}\binom{48}{k-j}}{\binom{52}{k}}\cdot \frac{\binom{k-j+1}{j}}{\binom{k}{j}}$

Therefore, the probability that $N>k$ i.e. we have no consecutive Aces from a set of $k$ cards drawn from the deck is

$\displaystyle\mathbb{P}(N>k)=\sum_{j=0}^4\mathbb{P}(T_{j,k})$

Note that $N$ can only take values between $0$ and $52$ (both inclusive). Therefore,

$\displaystyle\mathbb{E}(N)=\sum_{k=0}^{51}\mathbb{P}(N>k)=\sum_{k=0}^{51}\sum_{j=0}^4\mathbb{P}(T_{j,k})$


$\displaystyle\mathbb{E}(N)=\frac{257026}{5525}\approx 46.5205$

The simplified fraction kind of hints that there may be a more direct way of solving this. I'll update if I find something on this.

To solve this approximately, note that $\mathbb{P}(N>0)=\mathbb{P}(N>1)=1$. Also,

$\displaystyle \mathbb{P}(N>2)=\mathbb{P}(\text{first two cards are not consecutive Aces})=1-\mathbb{P}(\text{first two cards are Aces})=1-\frac{4}{52}\frac{3}{51}=\frac{220}{221}$

Now comes the approximation. On each subsequent draws, we assume that the probability that the current and the previous draw does not form two consecutive Aces is still $220/221$ and that each of these draws are independent. Therefore,

$\displaystyle \mathbb{P}(N>k)\approx\left(\frac{220}{221}\right)^{k-1}$ for $k>0$

This gives

$\displaystyle\mathbb{E}(N)=\mathbb{P}(N>0)+\sum_{k=1}^{51}\mathbb{P}(N>k)\approx 1+\sum_{k=1}^{51}\left(\frac{220}{221}\right)^{k-1}\approx 46.6350$

That's already a great approximation deviating from the correct value by less than quarter of a percentage.

If we further use a exponential approximation (which can also be obtained directly using a Poisson approximation), we have

$\displaystyle \mathbb{E}(N)\approx 1+51\frac{1-e^{-51\cdot 1/221}}{51\cdot 1/221}\approx 46.5431$

That's an insane level of accuracy!!

The following Mathematica code is used for this post.

Clear["Global`*"];
ProbG[k_] := Sum[Binomial[k - j + 1, j] / Binomial[k, j] * Binomial[4, j] Binomial[48, k - j] / Binomial[52, k], {j, 0, Min[4, k]}];
res = Sum[ProbG[k], {k, 0, 51}];
N[res, 15]
lam = 51 * Binomial[4, 2] / Binomial[52, 2];
approx = 1 + 51 * (1 - Exp[-lam]) / lam;
N[approx, 15]
p = Binomial[4, 2] / Binomial[52, 2];
approx = 1 + (1 - Power[1 - p, 51]) / p;
N[approx, 15]

The following code is the simulation for the problem.

Clear["Global`*"];
n = 10000000;
res = 0;
Do[
    test = Split[Mod[RandomSample[Range[0, 51]], 13]];
    Do[
        If[Length[k] <= 1,
            res += 1;,
            If[First[k] > 0, res += Length[k];, res += 2; Break[];];
        ];
    ,{k, test}
    ];
, {j, n}
];
res / n
N[%]

Until then
Yours Aye
Me