Sunday, September 25, 2022

Sunrise Problem - With Replacement

The famous Laplace Sunrise problem asks for the probability of the sun rising tomorrow given that it has risen for the past $n$ days. In this post, we generalize this problem to a setup where the population size is finite and how it changes the required probability.

Mathematically, we are looking for the posterior distribution of the unknown parameter $p$ such that

$X|p \sim \text{Bin}(n,p)$ and $p \sim \text{Beta}(\alpha,\beta)$

It is then well known that the posterior distribution is

$p|X \sim \text{Beta}(\alpha + X, \beta + n - X)$

Therefore, if we start with a uniform prior for $p$ so that $p \sim \text{Beta}(1,1)$ and given that $X=n$, then $p|X=n \sim \text{Beta}(1+n,1)$. Then

$\displaystyle \mathbb{E}(p|X)=\frac{n+1}{n+2}$

We now consider the following problem: Let's say we have an Urn containing $N$ balls such that some are white and rest are black. Let's say we draw $n_1$ balls from the Urn without replacement and observe the number of white balls. We then take $n_2$ balls from the Urn. Now, what is the probability distribution of the number of white balls in the second sample?

Casting the given information in mathematical terms, we have

$X_1|K \sim \text{HG}(N,K,n_1)$
$X_2|K-X_1 \sim \text{HG}(N-n_1,K-X_1,n_2)$ and
$K \sim \text{Beta-Bin}(N,\alpha,\beta)$

where the Beta-Binomial distribution is the discrete analog of the Beta distribution and has the discrete uniform distribution as the special case when $\alpha=\beta=1$. Note that we are looking for $X_2|X_1$.


$K-X_1|X_1 \sim \text{Beta-Bin}(N-n_1,\alpha+X_1,\beta+n_1-X_1)$

We now take a detour into a slightly different problem.

Let $Y|X \sim \text{HG}(N,K,n)$ and $K \sim \text{Bin}(N,p)$. What would be the unconditional distribution of $Y$?

We can get into an algebraic mess of expression to solve this but let's do it with a story proof. Let's say we have $N$ marbles each of which is painted either green (with probability $p$) or red and put into a bag. If we now draw $n$ marbles from the bag without replacement, what is the probability distribution of the green marbles?

With some thinking, it should be easy to easy that this is exactly what we have interms of $X$ (number of marbles painted green) and $Y$ (number of green marbles drawn) above. I posted this on reddit of which one particular reasoning was way better than mine which I give here.

If we don't know $X$, it is as good as saying that the colored marbles are wrapped in an opaque cover before being put in the bag. Therefore, the drawn marbles are no different from each other and unwrapping them after being drawn, we would find a marble to be green with probability $p$ and red otherwise. Therefore,

$Y \sim \text{Bin}(n,p)$

Notice that $N$ disappears and contributes nothing to the unconditional distribution. This result holds even if we replace the $\text{Bin}$ distribution with a $\text{Beta-Bin}$ distribution.

We now get back to our original problem. Knowing $X_2|K-X_1$ and $K-X_1|X_1$, we can immediately see that $X_2|X_1$ is Beta-binomially distributed. Therefore,

$X_2|X_1 \sim \text{Beta-Bin}(n_2,\alpha+X_1,\beta+n_1-X_1)$

If, for example, we started with an Urn of $N$ balls (in which some are white and some are black) and drew $n$ balls without replacement and found all of them to be white, then

$X_2|X_1=n \sim \text{Beta-Bin}(n_2,1+n,1)$

If the size of the second sample is 1, then this clearly shows that

$\displaystyle \mathbb{E}(X_2|X_1)=\frac{n+1}{n+2}$

Very surprisingly, this is exactly the same answer we would get even if we had made the draws with replacement (which is exactly the Sunrise problem)!!

We have so far generalized the Birthday problem, the Coupon collector problem and the problem of points in our blog and in every case, the 'finiteness' of the population alters the final result in one way or the other. Even in this case, at the onset, I doubt that very few would feel that the probability would remain unchanged in the finite case.

Until then
Yours Aye
Me

No comments:

Post a Comment