Tuesday, November 1, 2022

Minimum of an Exponential random variable - First arrival times

Let $T$ be an exponential random variable with parameter $\lambda$. In this post, we are going to find the expected value of $\text{min}(T,1)$ with four different method each of which is beautiful in its own way.

Direct integration
We know that the density function of $T$ is $\lambda e^{-\lambda t}$. Therefore,

$\begin{align}\displaystyle\mathbb{E}(\text{min}(T,1)) &= \int_0^\infty \text{min}(t,1)\lambda e^{-\lambda t}\,dt \\ &= \int_0^1 t\cdot\lambda e^{-\lambda t}\,dt+\int_1^\infty 1\cdot\lambda e^{-\lambda t}\,dt \\ &= \frac{1-(1+\lambda)e^{-\lambda}}{\lambda}+e^{-\lambda}\\ &= \frac{1-e^{-\lambda}}{\lambda}\end{align}$

Conditional expectation
Here we condition based on the value of the random variable and use the fact that $\mathbb{P}(T>t)=e^{-\lambda t}$ and $\mathbb{E}(T)=1/\lambda$. By the law of total expectation,

$\begin{align}\displaystyle\mathbb{E}(\text{min}(T,1))&=\mathbb{P}(T<1)\mathbb{E}(\text{min}(T,1)|T<1)+\mathbb{P}(T>1)\mathbb{E}(\text{min}(T,1)|T>1) \\ &=\mathbb{P}(T<1)\mathbb{E}(T|T<1)+\mathbb{P}(T>1) \end{align}$

But we can get the first term by conditioning on $T$. We know that

$\displaystyle \mathbb{E}(T)=\mathbb{P}(T<1)\mathbb{E}(T|T<1)+\mathbb{P}(T>1)\mathbb{E}(T|T>1)$

Using the memoryless property, we know that $\mathbb{E}(T|T>1)=1+\mathbb{E}(T)$. Therefore,

$\begin{align}\mathbb{P}(T<1)\mathbb{E}(T|T<1)&=\mathbb{E}(T)-\mathbb{P}(T>1)\mathbb{E}(T|T>1)\\&=\mathbb{E}(T)-\mathbb{P}(T>1)(1+\mathbb{E}(T))\\&=\mathbb{P}(T<1)\mathbb{E}(T)-\mathbb{P}(T>1)\\ \end{align}$

Using this, we get,

$\mathbb{E}(\text{min}(T,1))=\mathbb{P}(T<1)\mathbb{E}(T)=\displaystyle\frac{1-e^{-\lambda}}{\lambda}$

Story proof
Let's say the arrival rate of customers to a store follows a Poisson process with parameter $\lambda$. On a given day, the store owners decides to wait the next customer for maximum one hour and shut shop thereafter.

Using this story, we can see that the waiting time of the store owner is $\text{min}(T,1)$ where $T\sim \text{Exp}(\lambda)$.

If the store owner does not have a time limit of one hour, then he would wait for $\mathbb{E}(T)=1/\lambda$ hours for the first customer.

However, if no customer arrive during the first hour, which happens with probability $e^{-\lambda}$, then he would have to wait, on average, for another $1/\lambda$ units time for the first customer by the memoryless property. Therefore, because of his time limit, he saves, on average, $e^{-\lambda}/\lambda$ units of time.

This shows the average waiting time is $1/\lambda-e^{-\lambda}/\lambda$

Using the Poisson distribution
The approaches so far have used the memoryless property one way or the other. Here we do away without that.

Using the story above, let $X$ denote the number of customers that arrive during the first hour. It should be intuitively obvious that the average time of first arrival is $1/(X+1)$. For example, if three customers arrive, we would expect their arrival times, on average, to be the 15th, 30th and the 45th minute mark.

Obviously $X\sim\text{Poisson}(\lambda)$. Therefore,

$\displaystyle\mathbb{E}\left(\frac{1}{X+1}\right)=\sum_{k=0}^\infty \frac{1}{k+1}\cdot\frac{e^{-\lambda}\lambda^k}{k!}=\frac{1}{\lambda}\sum_{k=1}^\infty \frac{e^{-\lambda}\lambda^k}{k!}=\frac{1-e^{-\lambda}}{\lambda}$

There we have it! Four different proofs for this expectation. I'll detail why this expectation is kinda important in a later post.

Before I forget, please allow me to credit reddit users lukewarmtoasteroven and blungbat for the second and third proofs respectively in my reddit question on the same topic.

Until then
Yours Aye
Me

Wednesday, October 26, 2022

Asymptotic Probability of not finding a pair

Consider the problem of finding the probability of not finding a pair (not two consecutive cards of the same rank) from a well shuffled pack. We already discussed this problem in this post where I could this stackexchange post that uses Generating function to solve the problem.

I was eventually able to understand the Generating function approach in the SE post. I'll postpone that for a later post as we'll be looking at something equally interesting in this post. We'll be using the usual well known approach that is indispensable in this type of problems.

Using the ideas in the SE post, we see that the probability of not finding a pair in a pack of well shuffled deck is given by the integral,

$\displaystyle \mathbb{P}(\text{No pair})=\frac{1}{52!}\int_0^\infty e^{-z} (z^4-12z^3+36z^2-24z)^{13}\,dz$

Notice that this is not exactly in the form given in the SE post but we'll probe how the two expressions are equivalent in a later post. For now, let's focus just on the integral. Let,

$\displaystyle I_{4,n}=\int_0^\infty e^{-z} (z^4-12z^3+36z^2-24z)^n\,dz$

To apply Lapalce's method, we manipulate the above like as shown below.

$\displaystyle I_{4,n}=\int_0^\infty e^{n(-z/n+\ln(z^4-12z^3+36z^2-24z))}\,dz$

We now need the value of $z=z_0$ where the function $f(z)=-z/n+\ln(z^4-12z^3+36z^2-24z)$ is maximized. Here's where things get a little interesting.

Even though, it seems to be very hard to find the exact value of $z_0$, with some experimentation, it seems $z_0 \approx 4n+3$ for large $n$.

Expanding $f(z)$ at $z=4n+3$ and keeping terms upto $O(n^{-1})$, we see that

$\displaystyle f(z)\approx -4-\frac{3}{n}+\ln f(4n+3)-\frac{(z-4n-3)^2}{8n^2}$

Therefore,

$\displaystyle I_{4,n}\approx e^{-4n-3}(256n^4-288n^2-96n+9)^n \int_{-\infty}^\infty e^{\frac{-(z-4n-3)^2}{8n}}\,dz$

Using the Normal approximation, we then have,

$I_{4,n} \approx e^{-4n-3}(256n^4-288n^2-96n+9)^n\sqrt{2\pi}\sqrt{4n}$

Let $\mathbb{P}_{j,n}(\text{No Pair})$ be the probability of getting no pair in a pack of $nj$ cards where each card has an integer from $1$ to $n$ and each integer has $j$ cards. Then,

$\displaystyle\mathbb{P}_{j,n}(\text{No Pair})=\frac{I_{j,n}}{(nj)!}$

Using Stirling's approximation, we know that

$n!~\approx \sqrt{2\pi n}e^{-n}n^n$

Therefore,

$\displaystyle \mathbb{P}_{4,n}(\text{No Pair})\approx \frac{e^{-4n-3}(256n^4-288n^2-96n+9)^n\sqrt{2\pi}\sqrt{4n}}{\sqrt{2\pi}\sqrt{4n}e^{-4n}(4n)^{4n}}$

Simplifying further, we get

$\displaystyle \mathbb{P}_{4,n}(\text{No Pair})\approx e^{-3}\left(1-\frac{288n^2+96n-9}{256n^4}\right)^n\approx e^{-3}\left(1-\frac{9}{8n}\right)$

Doing this for different $j$, I was able to see that, in general,

$\displaystyle \mathbb{P}_{j,n}(\text{No Pair})\approx e^{1-j}\left(1-\frac{(j-1)^2}{2nj}\right)$

For example, this approximation gives $\mathbb{P}_{4,13}(\text{No Pair})\approx 0.0454785$ versus the exact value of $\mathbb{P}_{4,13}(\text{No Pair})=0.0454763$ which is extremely good.

Unfortunately, its not the case always. For example, the approximation above gives $\mathbb{P}_{13,4}(\text{No Pair})\approx 0.00000441615$ versus an actual value, given in the SE post quoted at the beginning of the post, of $0.00000118175$ which is understandable as $n$ in this case is very small.

But using the approximations $e^x\approx 1+x$ and $\ln (1-x)^{-1}\approx x+x^2/2$, we get refine our approximation to give,

$\displaystyle \mathbb{P}_{j,n}(\text{No Pair})\approx e^{-(j-1)}e^{\frac{-(j-1)^2}{2nj}}\approx \text{exp}\left(-nj \ln\left(\frac{1-j^{-1}}{n}\right)^{-1}\right)=\left(1-\frac{1-j^{-1}}{n}\right)^{nj}$

Now this 'hand-wavy' approximation gives

$\mathbb{P}_{4,13}(\text{No Pair})\approx 0.045501$ and $\mathbb{P}_{13,4}(\text{No Pair})\approx 0.00000118835$

which is much better in both the cases and gives us some us some reason to take the 'hand-waviness' seriously.

Until then
Yours Aye
Me

Sunday, September 25, 2022

Sunrise Problem - With Replacement

The famous Laplace Sunrise problem asks for the probability of the sun rising tomorrow given that it has risen for the past $n$ days. In this post, we generalize this problem to a setup where the population size is finite and how it changes the required probability.

Mathematically, we are looking for the posterior distribution of the unknown parameter $p$ such that

$X|p \sim \text{Bin}(n,p)$ and $p \sim \text{Beta}(\alpha,\beta)$

It is then well known that the posterior distribution is

$p|X \sim \text{Beta}(\alpha + X, \beta + n - X)$

Therefore, if we start with a uniform prior for $p$ so that $p \sim \text{Beta}(1,1)$ and given that $X=n$, then $p|X=n \sim \text{Beta}(1+n,1)$. Then

$\displaystyle \mathbb{E}(p|X)=\frac{n+1}{n+2}$

We now consider the following problem: Let's say we have an Urn containing $N$ balls such that some are white and rest are black. Let's say we draw $n_1$ balls from the Urn without replacement and observe the number of white balls. We then take $n_2$ balls from the Urn. Now, what is the probability distribution of the number of white balls in the second sample?

Casting the given information in mathematical terms, we have

$X_1|K \sim \text{HG}(N,K,n_1)$
$X_2|K-X_1 \sim \text{HG}(N-n_1,K-X_1,n_2)$ and
$K \sim \text{Beta-Bin}(N,\alpha,\beta)$

where the Beta-Binomial distribution is the discrete analog of the Beta distribution and has the discrete uniform distribution as the special case when $\alpha=\beta=1$. Note that we are looking for $X_2|X_1$.


$K-X_1|X_1 \sim \text{Beta-Bin}(N-n_1,\alpha+X_1,\beta+n_1-X_1)$

We now take a detour into a slightly different problem.

Let $Y|X \sim \text{HG}(N,K,n)$ and $K \sim \text{Bin}(N,p)$. What would be the unconditional distribution of $Y$?

We can get into an algebraic mess of expression to solve this but let's do it with a story proof. Let's say we have $N$ marbles each of which is painted either green (with probability $p$) or red and put into a bag. If we now draw $n$ marbles from the bag without replacement, what is the probability distribution of the green marbles?

With some thinking, it should be easy to easy that this is exactly what we have interms of $X$ (number of marbles painted green) and $Y$ (number of green marbles drawn) above. I posted this on reddit of which one particular reasoning was way better than mine which I give here.

If we don't know $X$, it is as good as saying that the colored marbles are wrapped in an opaque cover before being put in the bag. Therefore, the drawn marbles are no different from each other and unwrapping them after being drawn, we would find a marble to be green with probability $p$ and red otherwise. Therefore,

$Y \sim \text{Bin}(n,p)$

Notice that $N$ disappears and contributes nothing to the unconditional distribution. This result holds even if we replace the $\text{Bin}$ distribution with a $\text{Beta-Bin}$ distribution.

We now get back to our original problem. Knowing $X_2|K-X_1$ and $K-X_1|X_1$, we can immediately see that $X_2|X_1$ is Beta-binomially distributed. Therefore,

$X_2|X_1 \sim \text{Beta-Bin}(n_2,\alpha+X_1,\beta+n_1-X_1)$

If, for example, we started with an Urn of $N$ balls (in which some are white and some are black) and drew $n$ balls without replacement and found all of them to be white, then

$X_2|X_1=n \sim \text{Beta-Bin}(n_2,1+n,1)$

If the size of the second sample is 1, then this clearly shows that

$\displaystyle \mathbb{E}(X_2|X_1)=\frac{n+1}{n+2}$

Very surprisingly, this is exactly the same answer we would get even if we had made the draws with replacement (which is exactly the Sunrise problem)!!

We have so far generalized the Birthday problem, the Coupon collector problem and the problem of points in our blog and in every case, the 'finiteness' of the population alters the final result in one way or the other. Even in this case, at the onset, I doubt that very few would feel that the probability would remain unchanged in the finite case.

Until then
Yours Aye
Me

Expected number of cards drawing two Aces in a row

Let's say we shuffle a standard 52-card deck and draw cards one by one looking for.consecutive cards that form a pair (two cards with the same rank).

I was interested in this question after seeing this StackExchange post. Though the post does not offer an exact solution, the approximate solution given there was really interesting. When I solved the question finding the exact solution, I was really surprised by how good the approximation is.

In this post, we will solving a similar question both exactly and approximately. We are interested in the expected number of draws without replacement before drawing two Aces in a row.

Just to be clear, there may be cases we exhaust all the cards without finding two Aces in a row. In these cases, the number of cards drawn will obviously be 52.

Let $N$ be the random variable indicating the number of draws and we want $\mathbb{E}(N)$.

To solve this in exact form, we first need the probability of drawing $j$ Aces when drawing $k$ cards from the deck such that the Aces are not together. Let's denote this event by $T_{j,k}$. Then it is easy to see that,

$\displaystyle\mathbb{P}(T_{j,k})=\mathbb{P}(j\text{ Aces in }k\text{ draws})\mathbb{P}(\text{Aces are not consecutive})$

The first term is textbook Hypergeometric. For the second term, we lay the $k-j$ non-Ace cards giving us $k-j+1$ spaces between them. We simply have to $j$ spaces out of these for the Aces. Therefore,

$\displaystyle\mathbb{P}(T_{j,k})=\frac{\binom{4}{j}\binom{48}{k-j}}{\binom{52}{k}}\cdot \frac{\binom{k-j+1}{j}}{\binom{k}{j}}$

Therefore, the probability that $N>k$ i.e. we have no consecutive Aces from a set of $k$ cards drawn from the deck is

$\displaystyle\mathbb{P}(N>k)=\sum_{j=0}^4\mathbb{P}(T_{j,k})$

Note that $N$ can only take values between $0$ and $52$ (both inclusive). Therefore,

$\displaystyle\mathbb{E}(N)=\sum_{k=0}^{51}\mathbb{P}(N>k)=\sum_{k=0}^{51}\sum_{j=0}^4\mathbb{P}(T_{j,k})$


$\displaystyle\mathbb{E}(N)=\frac{257026}{5525}\approx 46.5205$

The simplified fraction kind of hints that there may be a more direct way of solving this. I'll update if I find something on this.

To solve this approximately, note that $\mathbb{P}(N>0)=\mathbb{P}(N>1)=1$. Also,

$\displaystyle \mathbb{P}(N>2)=\mathbb{P}(\text{first two cards are not consecutive Aces})=1-\mathbb{P}(\text{first two cards are Aces})=1-\frac{4}{52}\frac{3}{51}=\frac{220}{221}$

Now comes the approximation. On each subsequent draws, we assume that the probability that the current and the previous draw does not form two consecutive Aces is still $220/221$ and that each of these draws are independent. Therefore,

$\displaystyle \mathbb{P}(N>k)\approx\left(\frac{220}{221}\right)^{k-1}$ for $k>0$

This gives

$\displaystyle\mathbb{E}(N)=\mathbb{P}(N>0)+\sum_{k=1}^{51}\mathbb{P}(N>k)\approx 1+\sum_{k=1}^{51}\left(\frac{220}{221}\right)^{k-1}\approx 46.6350$

That's already a great approximation deviating from the correct value by less than quarter of a percentage.

If we further use a exponential approximation (which can also be obtained directly using a Poisson approximation), we have

$\displaystyle \mathbb{E}(N)\approx 1+51\frac{1-e^{-51\cdot 1/221}}{51\cdot 1/221}\approx 46.5431$

That's an insane level of accuracy!!

The following Mathematica code is used for this post.

Clear["Global`*"];
ProbG[k_] := Sum[Binomial[k - j + 1, j] / Binomial[k, j] * Binomial[4, j] Binomial[48, k - j] / Binomial[52, k], {j, 0, Min[4, k]}];
res = Sum[ProbG[k], {k, 0, 51}];
N[res, 15]
lam = 51 * Binomial[4, 2] / Binomial[52, 2];
approx = 1 + 51 * (1 - Exp[-lam]) / lam;
N[approx, 15]
p = Binomial[4, 2] / Binomial[52, 2];
approx = 1 + (1 - Power[1 - p, 51]) / p;
N[approx, 15]

The following code is the simulation for the problem.

Clear["Global`*"];
n = 10000000;
res = 0;
Do[
    test = Split[Mod[RandomSample[Range[0, 51]], 13]];
    Do[
        If[Length[k] <= 1,
            res += 1;,
            If[First[k] > 0, res += Length[k];, res += 2; Break[];];
        ];
    ,{k, test}
    ];
, {j, n}
];
res / n
N[%]

Until then
Yours Aye
Me

Monday, August 22, 2022

Factorial moments of the Binomial and related distributions

Factorial moments are defined as the expected value of the falling factorial of a random variable. In this post, we are going to try to compute the (scaled) Factorial moments of certain distributions without a direct computation.

The Factorial moment of a random variable $X$ is given by

$$\mathbb{E}[(X)_r]=\mathbb{E}[X(X-1)(X-2)\cdots (X-r+1)]$$

For our purposes, we focussing on the following definition which differs from the above only by a constant factor.

$$\displaystyle\mathbb{E}\left[\binom{X}{r}\right]=\mathbb{E}\left[\frac{(X)_r}{r!}\right]$$

Let's start with the famous Binomial distribution. The Binomial random variable $X$, with parameters $n$ and $p$, is the number of successes in a sequence of $n$ independent draws (with replacement), each with a success probability $p$.

Taking a cue from Problem of 7 of Strategic Practice 8 of Stat 110, $\binom{X}{k}$ denotes the set of draws where all the draws in the set result in a success. Creating an Indicator random variable for each set and using the linearity of expectations, we have

$$\displaystyle\mathbb{E}\left[\binom{X}{k}\right]=\binom{n}{k}\cdot p^k$$

We now move on to the Hypergeometric distribution (which is exactly the Stat 110 problem quoted above). Let $X$ denote the number of white balls from an Urn containing $N$ balls of which $K$ are white in a sample of $n$ draws. This is exactly the same problem given in Stat 110 quoted above.

$$\displaystyle\mathbb{E}\left[\binom{X}{k}\right]=\binom{n}{k}\cdot \frac{\binom{K}{k}}{\binom{N}{k}}$$

That is, we've considered an Indicator random variable for each $k$-set of draws and used the Linearity of Expectation. Note that $X$ is the number of successes where a 'success' is viewed as draw that gives a white ball.

Alternately, we can view a 'success' as a white ball that gets drawn. This way, we can solve the problem by considering an Indicator random variable for each $k$-set of white balls. Then,

$$\displaystyle\mathbb{E}\left[\binom{X}{k}\right]=\binom{K}{k}\cdot \frac{\binom{k}{k}\binom{N-k}{n-k}}{\binom{N}{n}}$$

Again, using the Linearity of Expectation, we restrict our attention to only a particular $k$-set of white balls. The probability that we are interested is that this particular set gets drawn when drawing $n$ balls. This is again a Hypergeometric distribution where only this particular set constitute 'successes' and rest are considered failures.

We now get to the Negative Binomial distribution with parameter $p$ and $j$ where $X$ denotes the number of failures we encounter before the $r$'th success. It can be verified with direct calculation that

$$\displaystyle\mathbb{E}\left[\binom{X}{k}\right]=\binom{k+r-1}{k}\left(\frac{q}{p}\right)^k$$

While the calculation is itself is not hard, I could not get a nice 'story' proof for the above. Rather we come back to this problem after the next case.

We now move to the Negative Hypergeometric distribution which will be most interesting aspect of this post. Here, again in the context of Urn and balls, $X$ denotes the number of black balls (failures) we get before the $r$'th white ball (success) from a population of $N$ balls in which $K$ are white.

Here again, $\binom{X}{k}$ denotes the $k$-set of draws that are black balls (failures) that gets drawn before the $r$'th white ball (success). Let $I_j,j \in \{1,2,\cdots,N-K\}$ be $1$ if the $j$th black ball (failure) gets drawn before the $r$'th white ball (success) and $0$ otherwise. Let the indicator random variable $I_S$ be the product of all the indicator random variable in the set $S$. Then,

$\displaystyle\mathbb{E}\left[\binom{X}{k}\right] =\sum_{S_k \subset \{1,2,\cdots,N-K\}}\mathbb{E}(I_{S_k})=\binom{N-K}{k}\mathbb{E}(I_1I_2\cdots I_k)=\binom{N-K}{k}\mathbb{P}(E)$

where $S_k$ denotes a $k$-element subset and $E$ denotes the event that failures $1$ to $k$ occur before the $r$'th success.

In other words, the probability that we want is the probability that we draw (without replacement) $k$ specific black balls from a bag containing $K$ white balls and $N-K$ black balls before the $r$'th white ball. This is a nice question by itself.

The key is to realise that we are concerned only about the $k$ specific black balls. This means we can completely ignore the remaining $N-K-k$ black balls and focus only on the $K$ white balls and the specific $k$ black balls.

The probability that we want is then the probability of getting $k$ black balls before the $r$'th white ball from an Urn containing $K+k$ balls of which $K$ are white. But that is exactly Negative Hypergeometric. Therefore,

$\displaystyle \mathbb{E}\left[\binom{X}{k}\right]=\binom{N-K}{k}\binom{k+r-1}{k}\frac{\binom{K+k-k-r}{K-r}}{\binom{K+k}{K}}=\binom{k+r-1}{k}\frac{\binom{N-K}{k}}{\binom{K+k}{k}}$

Alternately, the required probability can be seen as the probability of drawing $k$ black balls and $j-1$ white balls from an Urn containing $K+k$ balls of which $K$ are white. Then,

$\displaystyle \mathbb{E}\left[\binom{X}{k}\right]=\binom{N-K}{k}\frac{\binom{k}{k}\binom{K}{r-1}}{\binom{k+K}{k+r-1}}=\binom{N-K}{k}\frac{\binom{K}{r-1}}{\binom{k+K}{k+r-1}}$

Now we get back to the Negative Binomial case. Though I can't get a nice story proof for it, we can note that the first expression of the Negative Hypergeometric can be alternatively written as

$\displaystyle \mathbb{E}\left[\binom{X}{k}\right]=\binom{k+r-1}{k}\frac{\binom{N}{K+k}}{\binom{N}{K}}$

Using the asymptotic expression we derived in an earlier post, we can see that it matches with the result that we got from a direct calculation.

Hope you enjoyed the discussion. See ya in the next post.

Until then
Yours Aye
Me

Thursday, July 7, 2022

The Inverse Sum Theorem

This post talks about the 'Inverse Sum theorem' which is at the heart of the video for my SoME2 submission.


Consider two right angled triangles $OAC$ and $OBC$ having a common side $OC=1$ and right angled at $O$. Let $F$ and $E$ be the foot of altitude of the hypotenuse of the respective triangles. Let the line connecting $E$ and $F$ meet the $x$-axis at $D$. We claim that

$$\frac{1}{OD}=\frac{1}{OA}+\frac{1}{OB}$$

We give three proofs of this result with Coordinate geometry.

Co-ordinate Geometry
Let $\angle OCA=\alpha$. Because $OC=1$, we can immediately see that $OA=\tan \alpha$, $CA=\sec \alpha$ and $OF=\sin \alpha$. As $\angle AOF=\alpha$, the co-ordinates of point $F$ can be seen to be $F \equiv (\sin \alpha \cos \alpha, \sin^2 \alpha)$.

Similarly, if we have $\angle OCB=\beta$, then we have $E \equiv (\sin\beta \cos\beta, \sin^2\beta)$

Then the equation of line $EF$ is
$$\frac{y-\sin^2\beta}{\sin^2\alpha-\sin^2\beta}=\frac{x-\sin\beta\cos\beta}{\sin\alpha\cos\alpha-\sin\beta\cos\beta}$$
$D$ is the point on where this line meets the $x$-axis. Therefore, using $y=0$ in the above equation and solving for $x$ gives us $OD$. After some algebraic manipulations, this gives us 
$$x=\frac{\tan\alpha\tan\beta}{\tan\alpha+\tan\beta}$$
Therefore,
$$\displaystyle\frac{1}{OD}=\frac{1}{\tan\alpha}+\frac{1}{\tan\beta}=\frac{1}{OA}+\frac{1}{OB}$$

Trigonometry
Using the generalized Angled Bisector theorem in triangle $OEB$,
$$\frac{OD}{DB}=\frac{OE\sin\angle OED}{BE\sin\angle BED}$$
Using the law of Sines in triangle $OEB$, we get $OE\sin\angle EOB=BE\sin\angle EBO$. Therefore,
$$\frac{OD}{DB}=\frac{\sin\angle EBO}{\sin\angle EOB}\frac{\sin\angle OED}{\sin\angle BED}$$

But $\angle EOB$ and $\angle EBO$ are complementary. So are $\angle BED$ and $\angle OED$. Therefore,
$$\frac{OD}{DB}=\frac{\tan\angle EBO}{\tan\angle BED}=\frac{OC}{OB}\frac{1}{\tan\angle BED}$$

If we now draw a circle with $OC$ as diameter, we know that $E$ and $F$ must lie on this circle because of Thales theorem. Also, $OEF$ and $OCF$ must be equal because they are angles subtended by the chord $OF$. Therefore,
$$\angle BED=90^\circ - \angle OEF=90^\circ-\angle OCF=\angle OAC$$
Therefore,
$$\frac{OD}{DB}=\frac{OC}{OB}\frac{OA}{OC}\implies \frac{OD}{OB-OD}=\frac{OA}{OB}$$
Solving for $OD$, we get the desired result.

Circle Inversion
Using the diagram from above, we see that $OD^2=DF\cdot DE$ because of the tangent-secant theorem. If we imagine a circle $c$ with center $D$ and $OD$ as radius, then the above relation tells us that $E$ and $F$ are inverses of each other w.r.t circle $c$.

Here comes the nice part. Consider two points $U$ and $V$ which are reflections of each other w.r.t a line $l$. Let's say $U$, $V$ and $l$ are all reflected w.r.t a line $m$ to give $U'$, $V'$ and $l'$. Now it should be easy to see that $U'$ and $V'$ are reflections of each other w.r.t line $l'$. The amazing thing here is that this relation holds in case of Circle inversion as well.

Now imagine a circle $c'$ with $C$ as center and $CO$ as radius. Point $O$ is invariant as under an inversion w.r.t $c'$. Points $F$ and $A$ become inverses of each other under $c'$ whereas $E$ and $B$ become inverses. Circle $c$ we described above is invariant as it is orthogonal to Circle $c'$.

We know that $E$ and $F$ are inverses w.r.t $c$. Therefore, $A$ (inverse of $F$ w.r.t $c'$) and $B$ (inverse of $E$ w.r.t $c'$) must be inverses of each other w.r.t $c$ (which is its own inverse under an inversion in $c'$).

The cross ratio $\lambda$ of any four points $A$, $B$, $C$ and $D$ is defined as
$$\lambda=\frac{AB}{AC}\frac{CD}{BD}$$
If one of these points is at infinity, then the terms containing that point disappear from the definition. It is amazing that Cross ratio is invariant under Circle inversion.

The Cross ratio of $O$, $D$, $A$ and $B$ is then
$$\lambda=\frac{OD}{OA}\frac{AB}{DB}$$

The inverses of these points under an inversion w.r.t $c$ become $O$, $\infty$, $B$ and $A$. Therefore, the cross ratio after inversion is
$$\lambda=\frac{BA}{OB}$$
Because cross ratio is invariant under inversion. we have,
$$\frac{OD}{OA}\frac{AB}{DB}=\frac{BA}{OB} \implies \frac{OD}{DB}=\frac{OA}{OB}$$
which is the same relation we had in the previous case.

Interestingly, we also see that $OD^2=DA\cdot DB$ because $A$ and $B$ are inverses of each other w.r.t a circle with radius $OD$.

Until then
Yours Aye
Me

Wednesday, May 18, 2022

Expected values with Bertrand's Paradox

Bertrand's paradox is a neat problem that shows what happens when we take the words 'at random' a bit too casually. In this post, we will be looking at some problems in the same setup and how the different methods yield answers for the 'seemingly' same question.

Bertrand's paradox asks for the probability of a 'random' chord in a unit circle being longer than the side of an inscribed triangle. We are interested in the expected length of such a chord. We now see how the expected length varies if we choose the chord according to the three methods in the paradox.

(1) Circumference method
In this method, we choose two points uniformly randomly on the circumference of the circle and connect those points to create the chord. After the first point is chosen, the second point is uniformly distributed in the circumference.

With this method, if we draw a tangent at the first point, then the angle between the chord and the tangent is uniform in the range $(0,\pi)$. If we let that angle be $\theta$, then the length of the chord is simply $2\sin\theta$. Therefore,

$\displaystyle\mathbb{E}(L)=\int_0^{\pi/2}2\sin\theta\,\frac{d\theta}{\pi/2}=\frac{4}{\pi}\approx 1.2732$

(2) Radial method
In this method, we first choose a random radius. We then choose a point uniformly on this radius and draw a chord perpendicular to this line. If we let $x$ be the length of the point from the origin, then the length of the chord is $2\sqrt{1-x^2}$.

Because $x$ is uniform in $(0,1)$, we have

$\displaystyle \mathbb{E}(L)=\int_0^12\sqrt{1-x^2}dx=\frac{\pi}{2}\approx1.5708$

But here's something interesting. If we use $x=\cos\theta$ in the above integral, the integral becomes,

$\displaystyle \mathbb{E}(L)=\int_0^{\pi/2}2\sin\theta\cdot\sin\theta\,d\theta$

From what we saw in the 'Circumference method', we know that the $2\sin\theta$ denotes the length of the chord. Hence, whatever remains after that must be the PDF of the variable $\theta$.

Therefore, the PDF, $f(\theta)$, for this case is $\displaystyle f(\theta)=\sin\theta$

(3) Midpoint method
Here, a point is chosen randomly in the circle and the unique chord which has this point as the midpoint is drawn. The expected length is slightly tricky here. The trick is to identify the parameter of interest here is the distance of the point from the origin.

If we let $r$ be that distance, then the length of the chord is $2\sqrt{1-r^2}$. But $r$ is not uniform. It is well known that the density of function of $r$ is $2r$ for a unit circle. We then have

$\displaystyle \mathbb{E}(L)=\int_0^12\sqrt{1-r^2}\cdot 2r\,dr=\frac{4}{3}\approx 1.3334$

If we use the substitution $r=\cos\theta$ in the above integral, we can see that the PDF of $\theta$ in this case is $f(\theta)=\sin2\theta$

(4) Random point and line at random angle method
The video also specifies another method of creating a chord which involves choosing a point randomly in the circle and drawing a line at a random angle through this point. To make things clear, let the $r$ be the distance between the point and other centre of the circle.

Now we draw a line perpendicular to the radial line containing this point. The angle, $t$, that the random line (that we draw to create the chord) makes with this perpendicular line is uniformly distributed between $(0,\pi)$.

It can shown then the distance of the random chord and the centre of the circle is then $r\cos t$. Therefore,

$\displaystyle \mathbb{E}(L)=\int_0^{\pi/2}\int_0^12\sqrt{1-(r\cos t)^2}\cdot 2r\,dr\frac{dt}{\pi/2}=\frac{16}{3\pi}\approx 1.69765$

The above integral was solved using Wolfram Alpha.

Here we use the substitutions $u=r\cos t$ and $v=r\sin t$. Then we know that $dudv=rdrdt$. The limits of $r$ and $t$ covers the first quadrant of the unit circle and so should the limits of $u$ and $v$. Therefore,

$\displaystyle \mathbb{E}(L)=\int_0^{1}\int_0^{\sqrt{1-u^2}}2\sqrt{1-u^2}\cdot 2\,\frac{dvdu}{\pi/2}=\int_0^{1}2\sqrt{1-u^2}\cdot 2\sqrt{1-u^2}\,\frac{du}{\pi/2}$

If we now use $u=\cos\theta$, then we see that

$\displaystyle f(\theta)=\frac{2\sin^2\theta}{\pi/2}$

(5) Random point on a radial line and a line at random angle method
Just for the sake of needlessly complicating things, we look at another method where we choose a point randomly in a randomly chosen radial line and draw a at a random angle through this point (the number of randoms in this sentence!). This is almost exactly like the previous case.

$\displaystyle \mathbb{E}(L)=\int_0^{\pi/2}\int_0^12\sqrt{1-(x\cos t)^2}\cdot \,dx\frac{dt}{\pi/2}=\frac{2}{\pi}(2G+1)\approx 1.80286$

where $G$ is the Catalan constant. Note that this integral can be expressed as the integral of the Elliptic integral of the second kind. With that and some known results, this can also be calculated manually.

Again using the subtitution $u=x\cos t$ and $v=x\sin t$ with the Jacobian $dudv=xdxdt$, we see 

$\displaystyle \mathbb{E}(L)=\int_0^{1}\int_0^{\sqrt{1-u^2}}2\sqrt{1-u^2}\cdot \,\frac{dvdu}{x\cdot\pi/2}$

But $x=\sqrt{u^2+v^2}$. Using this, the fact that the integral of the secant function is the inverse hyperbolic sine of the tangent function and the substitution $u=\cos t$, we can see that the PDF in this case as

$\displaystyle f(\theta)=\frac{\sin\theta\sinh^{-1}(\tan\theta)}{\pi/2}$

That sums up the Expected values. But we can do more. There was a recent Numberphile video on similar lines. The question discussed was to find the probability that two points selected at random in a circle is lies on different sides of a 'random' chord. Here's where the PDFs we've found so far will be really useful. We'll see those in the next post.

Hope you enjoyed this.

Until then
Yours Aye
Me

Candles and Cake problem - Probabilities with Bertrand's Paradox

The video I referred to in the earlier post only shows a Monte Carlo solution but we can find the exact values quite easily using the density functions.

Because the two candles (or points) are assumed to be uniformly randomly distributed in the circle (or the cake), an important parameter that keeps repeating in all the cases below is the area of a circular segment which can given in terms of the central angle $t$ as $(t-\sin t)/2$. The probability that a point randomly chosen in a unit circle lies in this segment is then $(t-\sin t)/2/\pi$.

Note that the density we found in the earlier refers are for the random variable $\theta$, the angle the chord makes with the tangent. The angle subtended by the chord is twice this value.

Let $E$ denote the event that two randomly chosen points on the circle lie on the opposite side of the chord chosen according to the following methods.

(1) Circumference method
Like before, if we let $\theta$ be the angle between the tangent and the chord, the central angle becomes $2\theta$. Therefore, the probability that a point randomly chosen in the circle lies in the segment created by this chord is the ratio of the segment's area to that of the circle. Therefore,

$\displaystyle\mathbb{P}(E)=\int_0^{\pi/2}2\left(\frac{2\theta-\sin2\theta}{2\pi}\right)\left(1-\frac{2\theta-\sin2\theta}{2\pi}\right)\,\frac{d\theta}{\pi/2}=\frac{1}{3}-\frac{5}{4\pi^2}\approx 0.20668$

Even though we can solve this easily, Wolfram Alpha does the job perfectly.

(2) Radial line method
We know from our earlier post that the density function of the tangent angle in this case is just $\sin\theta$. Therefore the required probaility

$\displaystyle\mathbb{P}(E)=\int_0^{\pi/2}2\left(\frac{2\theta-\sin2\theta}{2\pi}\right)\left(1-\frac{2\theta-\sin2\theta}{2\pi}\right)\cdot\sin\theta\,d\theta=\frac{128}{45\pi^2}\approx 0.28820$

This can also be simplified manually, but again Wolfram Alpha minimizes our effort and shows

Note that this is $\pi$ times the expected distance between two points chosen at random inside a circle. Coincidence?

(3) Midpoint method
Again, using the density from the previous post, the probability in this case is,

$\displaystyle\mathbb{P}(E)=\int_0^{\pi/2}2\left(\frac{2\theta-\sin2\theta}{2\pi}\right)\left(1-\frac{2\theta-\sin2\theta}{2\pi}\right)\cdot\sin2\theta\,d\theta=\frac{1}{8}+\frac{2}{3\pi^2}\approx 0.19255$

Simple even without Wolfram Alpha.

(4) Random point and line at random angle method
Like the cases above we use the density function for this case that we already found in the previous post.

$\displaystyle\mathbb{P}(E)=\int_0^{\pi/2}2\left(\frac{2\theta-\sin2\theta}{2\pi}\right)\left(1-\frac{2\theta-\sin2\theta}{2\pi}\right)\cdot2\sin^2\theta\,\frac{d\theta}{\pi/2}=\frac{1}{3}\approx 0.33334$

Solved with Wolfram Alpha.

(5) Random point on a radial line and a line at random angle method
Diving right into the probability with the density function at our disposal,

$\displaystyle\mathbb{P}(E)=\int_0^{\pi/2}2\left(\frac{2\theta-\sin2\theta}{2\pi}\right)\left(1-\frac{2\theta-\sin2\theta}{2\pi}\right)\cdot\sin\theta\sinh^{-1}(\tan\theta)\,\frac{d\theta}{\pi/2}\approx 0.386408$

This time Wolfram Alpha only gives us a numerical approximation. But with Mathematica, we can confirm that the exact value is

$\displaystyle \mathbb{P}(E)=\frac{427+60\pi^2-480\log 2}{180\pi^2}$


For the sake of completion, let's finish this post with one more method that was used in the video. Here, the cut is created by choosing two random points on the circumference while the two candles were chosen randomly on a randomly chosen radius.

The first point on the circumference can be chosen to be the 'south pole' of the circle. After the second point is chosen on the circumference, let the chord subtend an angle of $2t$ at the centre. Using the symmetry of the problem, we can limit the range of $t$ to be in $(0,\pi/2)$.

We now calculate the probability that the first candle lies to the right of the chord and the second on the left. Let the radii selected for the first candle make an angle $x$ with the radius perpendicular to the chord. We get a a non-zero probability only if $-t \leq x \leq t$. When $x$ lies in this range, the probability of the first candle landing to the right of the cut is $1-\cos t/\cos x$ and $0$ otherwise.

Similarly, let $y$ be the angle between the radius chosen for the second candle and the radius perpendicular to the cut. When $-t \leq y \leq t$, the probability of the second candle ending up to the left of the cut is $\cos t/\cos y$. Else, the probability is $1$.

Therefore probability in this case depends on the following two integrals.

$\displaystyle \mathcal{I}_1=\int_0^{\pi/2} \int_{-t}^t \int_{-t}^t \frac{\cos t}{\cos y}\left(1-\frac{\cos t}{\cos x}\right)\,dy\,dx\,dt$

$\displaystyle \mathcal{I}_2=\int_0^{\pi/2} \int_{-t}^t \int_{t}^{2\pi-t} \left(1-\frac{\cos t}{\cos x}\right)\,dy\,dx\,dt$

These are very hard to evaluate even with Mathematica. Numerically evaluating these shows,

$\displaystyle \mathbb{P}(E)=2\cdot\frac{\mathcal{I}_1+\mathcal{I}_2}{\pi/2\cdot 2\pi \cdot 2\pi}\approx 0.161612$

Hope you enjoyed this.

Until then
Yours Aye
Me

Sunday, April 24, 2022

The Problem of points - Without Replacement

Consider an Urn with $A$ green balls and $B$ red balls. Two players A and B play a game where they draw balls from this Urn one at a time (without replacement). If the drawn ball is green, Player A gets a point. Else, Player B gets a point.

If Player A needs $a$ points to win the game whereas Player B needs $b$ points, the question that we address in this post concerns the probability that Player A wins the match.

As the title says, this is just the Problem of points but without replacement. We've already encountered the problem in our blog in the post titled A note on Partial fractions where we saw how Pascal and Fermat independently solved the problem with different reasoning.

Interestingly, Pascal reasoning involves the (CDF of) Neg. Binomial distribution whereas Fermat's idea involves the (CDF of) Binomial distribution. The central insight in both their ideas is in realising that we'll need a maximum of $a+b-1$ games to conclude the match.

Let's consider the original Problem for points for a moment. If we let $p$ and $q$ represent the probability of A and B winning a single game, then using Pascal's idea,

$\displaystyle \mathbb{P}(\text{A wins the match})=\sum_{k=0}^{b-1}\binom{k+a-1}{k}p^aq^k$

where the individual terms are probabilities from the Neg. Binomial distribution, $NB(a,p)$.

Using Fermat's idea,

$\displaystyle \mathbb{P}(\text{B wins the match})=\sum_{k=b}^{a+b-1}\binom{a+b-1}{k}q^kp^{a+b-1-k}$

where the individual terms are probabilities from the Binomial distribution, $B(a+b-1,p)$.

Now using our discussion about the Neg. Hypergeometric distribution in the previous case and Pascal's idea, we could solve the Without-Replacement case of the Problem of points. Or we could go for simplicity and use the Hypergeometric distribution and Fermat's idea.

Either way, let $P(A,a,B,b)$ be the probability Player A winning the game in the setup described at the start of the post and $M$ be a very large number. It should be clear the classical case can be derived out of this setup as $P(pM,a,qM,b)$ with $p+q=1$.

Obviously, when $A=B$ and $a=b$, either player has an equal chance of winning because of symmetry. Similarly, when $A=B$ and $a<b$, Player A has an advantage. Same goes when $a=b$ and $A>B$.

But consider $p(50,25,40,20)$. Player A has 10 more balls than Player B but needs 5 more points to win the game. It isn't immediately clear whether this is advantageous to Player A or not? In this case, the probability of Player A winning the match is only (approx.) 49.07%.

Naturally, we might think as we break symmetry, the match skews towards one way. Perhaps the most surprising result about the Without-Replacement case of Problem of points is the one family of parameters that form an exception.

Consider $p(21, 11, 19, 10)$. Player A has two more balls than Player B but needs one extra point to win the match. Surprisingly, this match is evenly poised. Not just this, for positive integer $n$,

$\displaystyle p(2n+1,n+1,2n-1,n)=\frac{1}{2}$

Clearly, the parameters that define the game lacks symmetry. Still the game is evenly poised between the two players. I could neither come up with a reason nor a simple explanation of why this should be the case? If you can, please post it in the comments below.


Clear["Global`*"];
pp[A_, a_, B_, b_] := (A / (A + B)) Binomial[A - 1, a - 1] Sum[Binomial[B, j] / Binomial[A + B - 1, j + a - 1], {j, 0, b - 1}];
p[A_, a_, B_, b_] := Sum[Binomial[j + a - 1, j] Binomial[A - a + B - j, B - j], {j, 0, b - 1}] / Binomial[A + B, B];
k = 2;
ListPlot[Table[p[2n + k, n + k, 2n - k, n], {n, 100}]]
p[50, 25, 40, 20]
N[%]
p[21, 11, 19, 10]
N[%]
p[26, 13, 22, 12]
N[%]


Until then
Yours Aye
Me

Saturday, April 23, 2022

Expected distance between two points inside a Sphere

This post picks up where we left earlier. As discussed before, we want to find the expected distance between two points randomly selected in an $n$-sphere.

Let the two points be $P$ and $Q$, and $O$ be the centre. The first step is to apply the technique of projective Reduction with $O$ as the scaling point. This shows that the expected length between two points inside a $n$-sphere is $2n/(2n+1)$ times the expected length between a point on the surface of $n$-sphere and another point inside the $n$-sphere.

WLOG, let $P$ be the point on the surface and let's orient the sphere so that point $P$ becomes the south pole. The next step is apply another Projective Reduction with the south pole as the scaling point. This step gives another factor of $n/(n+1)$ for the expected length. Note that while the first reduction preserves the uniformity of the points on the respective 'surfaces', it is not the case with the second reduction. This is what makes the problem tricky.

To find the density of $Q$, consider a random chord from point $P$ (the south pole) along with the axis on which $P$ lies. Let $\theta$ be the angle between the chord and the axis.

Rotating the chord along this axis cuts out a (for a lack of a better word) spherical-cone which contains a cone with its apex at the south pole and a spherical cap at the top (Imaging the other end of the chord to be on the opposite side of $P$). With simple geometry, it is also apparent that the angle between $OQ$ and the axis is $2\theta$.

If we let $V_{\text{sp. cone}}(\theta, r)$ be the volume of the spherical cone from a $n$-sphere of radius $r$ and apex angle $\theta$, then we have,

$\begin{align} \displaystyle V_{\text{sph. cone}}(\theta, r) &= V_{\text{cone}} + V_{\text{sph. cap}}\\&=  \frac{1}{n}V_{n-1}(r \sin 2\theta)(r+r\cos2\theta)+\int_{0}^{2\theta}V_{n-1}(r \sin t) \,d(-r\cos t)\\ &= r^n\frac{v_{n-1}}{n}\sin^{n-1}2\theta \cdot 2  \cos^2\theta + r^nv_{n-1}\int_0^{2\theta}\sin^nt\,dt \\ &=\frac{2^nr^nv_{n-1}}{n}\sin^{n-1}\theta\cos^{n+1}\theta + + r^nv_{n-1}\int_0^{2\theta}\sin^nt\,dt \end{align}$

We have used the fact that the area of an $n$-dimensional cone is $1/n$ times the volume of the circumscribing 'cylinder'. For the volume of the cap, we have used that the volume is the sum of the areas of $(n-1)$ dimensional 'circles' with infinitesimal thickness (see here for more). 

Now, $\frac{\partial V_{\text{sph. cone}}}{\partial r \partial \theta}$ gives volume between $V_{\text{sph. cone}}(\theta + d\theta, r+dr)$ and $V_{\text{sph. cone}}(\theta, r)$. When $Q$ lies in this infinitesimal volume, the distance between $P$ and $Q$ is just $2\cos\theta$.

Because we are only interested in distribution of $Q$ on the surface of the $n$-sphere, we can integrate out the $r$. Therefore, the density is given by,

$\displaystyle f(\theta)=\frac{1}{v_n}\int_{0}^1\frac{\partial V_{\text{sph. cone}}}{\partial r \partial \theta}\,dr$

We now get,

$\displaystyle f(\theta)=\frac{2^nv_{n-1}}{nv_n}((n-1)\cos^{n+2}\theta\sin^{n-2}\theta-(n+1)\sin^n\theta\cos^n\theta)+\frac{v_{n-1}}{v_n}\cdot 2\sin^n2\theta$

where we have used Leibniz differential under integral sign for the last term. Further simplification then gives,

$\displaystyle f(\theta)=\frac{2^nv_{n-1}}{nv_n}(n-1)\sin^{n-2}\theta\cos^n\theta$

This nice simplification allows us to express the expected length $e_n$ in terms of Beta function.

$\displaystyle e_n=\int_0^{\pi/2}2\cos\theta\cdot f(\theta)\,d\theta=\frac{2^n}{n}\frac{v_{n-1}}{v_n}(n-1)B(n/2-1/2, n/2+1)$

Finally, we see that the expected length $L$ between two points randomly selected inside an $n$-sphere is,

$\displaystyle \mathbb{E}(L)=\frac{2n}{2n+1}\frac{n}{n+1}\frac{2^n(n-1)}{n}\frac{B(n/2-1/2,n/2+1)}{B(n/2+1/2,1/2)}$

Interestingly, at $n\to \infty$, irrespective of whether we choose points on the surface or the inside, the expected length goes to $\sqrt{2}$. I can't clearly see why but if you have an idea, please do share it with us.

UPDATE 3 Jul 2023: Generalising this, we can see that

$\displaystyle \mathbb{E}(L^k)=\frac{2n}{2n+k}\frac{n}{n+k}\frac{2^{n+k-1}(n-1)}{n}\frac{B(n/2-1/2,n/2+k/2+1/2)}{B(n/2+1/2,1/2)}$

Using this, we can see, using WA, that

$\displaystyle \lim_{n \to \infty}\mathbb{E}(L^k)=\sqrt{2}^k$

Also, using the limit definition of the logarithm function,

$\displaystyle \mathbb{E}(\ln L)=-\frac{1}{n}+\frac{n}{2}\sum_{k=1}^\infty\frac{(-1)^{k-1}}{k(n+k)}$

This shows us the geometric mean of distance between two points chosen uniformly randomly from a unit circle is $e^{-1/4}$ and that of the sphere is $2e^{-3/4}$

Until then
Yours Aye
Me

Expected distance between two points on the Surface of a sphere

Problems in Geometric Probability are always fascinating. In this post, we going to see two of the well known and challenging problems in this topic.

The first problem is about the expected distance between two points selected at random from a surface of an $n$-sphere. To clarify, I'm considering a circle to be a $2$-sphere, a sphere to be a $3$-sphere and so on.

This problem is relatively easy as can seen from this stackexchange post. Let $S_n(R)$ and $V_n(R)$ denote the surface area and the volume of an $n$-sphere with radius $R$. We know that,

$S_n(R)=s_nR^{n-1}$ and $V_n(R)=v_nR^n$

where $s_n=\dfrac{2\pi^{n/2}}{\Gamma(n/2)}$ and $v_n=\dfrac{2}{n}\dfrac{\pi^{n/2}}{\Gamma(n/2)}$

For the problem under consideration, we can choose one of the points to be on the south pole of the $n$-sphere. Consider one of the axis that connects this point to the origin. A hyperplane perpendicular to this axis cuts the surface of the $n$-sphere to give a $(n-1)$-spherical surface. 

If we consider multiple such hyperplanes parallel to one another along the axis, the $n$-spherical surface is sliced into multiple rings. The picture below shows this in case of a $3$-sphere (taken from this Stack Exchange post).


Because second point is uniformly distributed in the $n$-spherical surface, it is equally likely to be present in any of these rings.

Let $\theta$ be the angle subtended between the radius that contains the second point and radius containing the first point. Then the distance between the points is $2\sin(\theta/2)$.

If we consider these rings to be infinitesimally small, then the density function of the second point lying in any of these rings is equal to the area of the ring. But this is easy. This is just the surface area of the  sliced $(n-1)$-hypersphere times $rd\theta$ (obviously $r=1$ for the unit sphere; used here for illustrative purposes). Therefore,

$\displaystyle \mathbb{E}(L)=\frac{1}{s_n}\int_0^\pi 2\sin(\theta/2)\cdot S_{n-1}(r\sin\theta)\cdot r\,d\theta=\frac{s_{n-1}}{s_n}\int_0^\pi 2\sin(\theta/2)\cdot \sin^{n-2}\theta \,d\theta$

Using Wolfram Alpha, we then see that

$\displaystyle \mathbb{E}(L)=\frac{\Gamma(n/2)}{\sqrt{\pi}\Gamma((n-1)/2)}\cdot \frac{2\sqrt{\pi}\Gamma(n-1)}{\Gamma(n-1/2)}=2\frac{\Gamma(n/2)}{\Gamma((n-1)/2)}\cdot \frac{\Gamma(n-1)}{\Gamma(n-1/2)}$

Now, the second problem asks for the expected distance between a point randomly selected on the $n$-spherical surface and a point selected on the $n$-spherical volume. This problem is more involved and requires a little more trickery. We'll see this in the next post.


Until then
Yours Aye
Me

Wednesday, March 23, 2022

Notes on the Hypergeometric Distribution and its Negative

I wanted to create a post about the Hypergeometric distribution, one of the most important distributions in probability theory, as a prelude to my next post.

The probability mass function of a Hypergeometric distribution with parameters $N$, $K$ and $n$ is given by

$\displaystyle f(k;N,K,n)=\mathbb{P}(X=k)=\frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$

The distribution has several symmetries which will be the key content of this post.

To start with, the role of successes and failures could be swapped. In terms of probability, this means that $f(k;N,K,n)=f(n-k;N,N-K,n)$. The binomial coefficients in the numerator of $f(k;N,K,n)$ gets swapped with this transformation.

In the next setup, the role of drawn and not drawn elements could be swapped. In terms of probability, this gives $f(k;N,K,n)=f(K-k;N,K,N-n)$. This amounts to invoking the symmetry of the binomial coefficient $^nC_k=^nC_{n-k}$.

Finally, the role of drawn elements and successes could be swapped as well. In probability terms, this becomes $f(k;N,K,n)=f(k;N,n,K)$. In other words,

$\displaystyle \mathbb{P}(X=k)=\binom{n}{k}\frac{\binom{N-n}{K-k}}{\binom{N}{K}}$

I find this pretty for two reasons. Firstly, there are two 'nCk' terms with the lowercase pair in the numerator and the uppercase in the denominator which gives it a nice structure. Secondly, and most importantly, this brings out the parallel with the Binomial distribution.

Comparing the probability of $k$ successes which we have above with that of a $\text{Bin}(n,k)$, we have,

$\displaystyle \frac{\binom{N-n}{K-k}}{\binom{N}{K}} \approx \left(\frac{K}{N}\right)^k\left(1-\frac{K}{N}\right)^{n-k}$

where $N \gg n$ and ratio of $K$ and $N$ is finite. This can be seen as an asymptotic expression for the ratio of two binomial coefficients.

To make better use of this asymptotic expression, let $K=pN$ and $p+q=1$. Then, the above can be rewritten as

$\displaystyle p^iq^j \approx \frac{\binom{N-i-j}{K-i}}{\binom{N}{K}}$

Lastly, all the above symmetries can be brought under a single umbrella by using beautiful idea from Duality and Symmetry in the Hypergeometric distribution. Surprisingly, this idea is relatively unknown. The authors show that the probability associated with the Hypergeometric distribution can also be written as,

$\displaystyle \mathbb{P}(X=k)=\binom{N}{k,n-k,K-k}\bigg/ \binom{N}{n}\binom{N}{K}$

where the numerator should be understood as the multinomial coefficient.

Finally, we come to visit the lesser known relative of the Hypergeometric distribution - the Negative Hypergeometric distribution. Just like the Hypergeometric parallels the Binomial, the Neg. Hypergeometric parallels the Neg. Binomial distribution.

We could use the exact idea of Neg. Binomial to derive the probability expression for Neg. Hypergeometric. But considering what we've seen so far, we could do better.

Let $Y$ be a Neg. Binomial variable that counts number of failures encountered until the $r$th success and $W$ be the corresponding Neg. Hypergeometric variable. We know that

$\displaystyle \mathbb{P}(Y=k)=\binom{k+r-1}{k}p^rq^k$

Now we just have to make use of the asymptotic expression above to derive the same for the Neg. Hypergeometric case. We have,

$\displaystyle \mathbb{P}(W=k)=\binom{k+r-1}{k}\frac{\binom{N-k-r}{K-r}}{\binom{N}{K}}$

Neat, isn't it?


Until then
Yours Aye
Me

Friday, January 28, 2022

Matt Parker and the London Lottery

In his (aptly) named video 'Matt Explains: The Lottery', Matt explains the probabilities associated with the London lottery. In the first video he made on this topic, he worked out the probability of winning the Jackpot with the new system where we have a chance to win an extra lottery.

To make things clear, here are the winnings that I understood from his video. Six numbers are drawn (without repetition) from a pool of 59 numbers. If all 6 numbers match, then we win the Jackpot. If we match 5 numbers, then we get a bonus number whose match win us 50000 pounds and 1000 pounds otherwise. If we match 4 numbers, we win a 100 pounds and 3 matches wins us 25 pounds. Finally, under the new system, matching 2 numbers gives us no cash prize but gives us a new lottery which we can use the next time.

The number of matches are given by the Hypergeometric distribution as explained (and calculated) by Matt in the video. In his first video, Matt ignores all cash prizes and focuses only on the Jackpot and the extra lottery.

If we let $p_k$ denote the probability of $k$ matches and $p$ denote the probability of ultimately winning the Jackpot, we have

$p=p_6+p_2p$ which upon simplification gives $\displaystyle p = \frac{p_6}{1-p_2}$

Substituting the known values, gives a probability of $p=2.459111 \times 10^{-8}$ (or) an Odds of 1 in 40665098.

As we will be dealing with extremely small probabilities, let's define 1 micro-bps as a millionth of a basis point. In this sense, the probability in the above case is approx. 245.91 micro-bps.

At this point, as per a Subscriber's suggestion, Matt tries to attack the problem where we repeatedly buy lotteries using all our winnings until we win the Jackpot. Here Matt calculates the expected number of lotteries won with a given lottery.

If we let $l$ denote the expected number of lotteries won per lottery, then

$l=p_2 \cdot 1 + p_3 \cdot 12.5 + p_4 \cdot 50 + p_{5,1} \cdot 500 + p_{5,2} \cdot 25000 \approx 0.257191$

where we have used the fact that each lottery cost 2 pounds. Matt now uses the following equation (indirectly), to get the probability of winning the Jackpot.

$p=p_6+l \cdot p \tag{1} \label{wrongeqn}$ which gives $\displaystyle p = p_6/(1-l)$.

Using the known values, we get $p=298.7830655$ micro-bps (or) an Odds of about 1 in 33469098.

But something is definitely amiss in $\eqref{wrongeqn}$. For example, we can easily construct a case where $l > 1$ which will then give negative probabilities. What then is the error, if any, in Matt's reasoning? Or is there a hidden assumption that was not explicitly stated here?

To understand whats going on here, we should use idea that the probability of getting at least one win in $n$ trials is $1-(1-p)^n$ and interpret $p$ as the probability of winning atleast one Jackpot. Using this, a better expression is given by,

$p=p_6+(1 - p_6)(1 - (1 - p)^\bar{l}) \tag{2} \label{crudeeqn}$

where $\bar{l}=l/(1-p_6)$ is the expected number of lotteries won conditional on not winning the Jackpot directly. The above equation then means that we either win the Jackpot or get some lotteries to continue our pursuit of winning the Jackpot. Not so surprisingly, this equation can be solved in closed form. We have,

$\displaystyle p = 1 - (1 - p_6)^{1 / (1 - \bar{l})} \tag{3} \label{cruderes}$

This gives a probability of $p=298.7830666$ micro-bps (or) an Odds of about 1 in 33469097.

If we assume $p \approx 0$ in $\eqref{crudeeqn}$, then using the approximation $(1+x)^n \approx nx$ for small $x$, we end up with Matt's result. This clearly shows the hidden assumption in result derived by Matt.

But even $\eqref{crudeeqn}$ is still not exactly correct. This can still result in a value that is less than 0 or greater than 1 in certain cases which doesn't make any sense.

This problem occurs because, in the RHS of $\eqref{crudeeqn}$, we casually make an assumption that $\mathbb{E}(1-(1-p)^L)=1-(1-p)^{\mathbb{E}(L)}$ where $L$ is the random number of lotteries won (conditional on not winning the Jackpot). But this is clearly wrong because of Jensen's Inequality.

Therefore, the correct way to do this would be to solve the following equation.

$p=p_6+p_{5,2}(1-(1-p)^{25000})+p_{5,1}(1-(1-p)^{500})+p_4(1-(1-p)^{50})+p_3(1-(1-p)^{12.5})+p_2(1-(1-p))$

where we explicitly condition on the resulting coupons. Solving this, we get $p=298.782539$ micro-bps (or) an Odds of about 1 in 33469156. 

This is good but there is still one final piece of hiccup here. In case of 2 matches, we get 25 pounds which we have assumed is equal to 12.5 lotteries. But non-integer tickets doesn't make much sense.

In case 2 matches wins us 24 pounds, then we would have $p=296.70581679$ micro-bps (or) an Odds of about 1 in 33703416. Had it been 26 pounds, then we would have $p=300.888537999$ micro-bps (or) an Odds of about 1 in 33234897.

The precise way to model this is to split the problem into cases: one in which we have just a lottery and the other in which we have the lottery and a pound. If we let $p$ and $p_e$ be the probabilities of winning at least one Jackpot in the respective cases, then we have

$p=p_6+p_{5,2}(1-(1-p)^{25000})+p_{5,1}(1-(1-p)^{500})+p_4(1-(1-p)^{50})+p_3(1-(1-p_e)(1-p)^{11})+p_2(1-(1-p)) \tag{4} \label{firstsimul}$
and
$p_e=p_6+p_{5,2}(1-(1-p_e)(1-p)^{24999})+p_{5,1}(1-(1-p_e)(1-p)^{499})+p_4(1-(1-p_e)(1-p)^{49})+p_3(1-(1-p)^{13})+p_2(1-(1-p_e)) \tag{5} \label{secondsimul} $

Solving this simultaneous equations gives, $p=296.75282794$ micro-bps (or) an Odds of about 1 in 33698077 and $p_e=300.13462527$ micro-bps (or) an Odds of about 1 in 33318380.

These equations have to be solved numerically which may be compute-intensive. A compromise would be let Jensen's inequality slide and consider the following equations.

$p=p_6+(1-p_6)(1-(1-p)^{\bar{l}_{11}}(1-p_e)^{\bar{l}_{12}})$ and $p_e=p_6+(1-p_6)(1-(1-p)^{\bar{l}_{21}}(1-p_e)^{\bar{l}_{22}})$

where $l_{11}=25000p_{5,2}+500p_{5,1}+50p_4+11p_3+p_2$, $l_{12}=p_3$, $l_{21}=24999p_{5,2}+499p_{5,1}+49p_4+13p_3$ and $l_{22}=p_{5,2}+p_{5,1}+p_4+p_2$ are the (unconditional) expected lotteries and lottery-pounds in the respective cases, and $\bar{l}_k=l_k/(1-p_6)$ for $k \in \{11, 12, 21, 22\}$. 

Like before, this has the advantage of being solved in closed form. We have,

$p=1-(1-p_6)^{\bar{m}}$ and $p_e=1-(1-p_6)^{\bar{m}_e}$

where $\displaystyle \bar{m} = \frac{1-\bar{l}_{22}+\bar{l}_{12}}{(1-\bar{l}_{11})(1-\bar{l}_{22})-\bar{l}_{12}\bar{l}_{21}}$ and $\displaystyle \bar{m}_e=\frac{1-\bar{l}_{11}+\bar{l}_{21}}{(1-\bar{l}_{11})(1-\bar{l}_{22})-\bar{l}_{12}\bar{l}_{21}}$

Using the known values, we get $p=296.7533437$ micro-bps and $p_e=300.1351482$ micro-bps. Like before, if we assume $p_6$ is also small, then we get an even simpler linear relation.

Or we could assume both $p$ and $p_e$ are small in $\eqref{firstsimul}$ and $\eqref{secondsimul}$, which will give us a pair of linear simultaneous equations. Solving them would give us,

$\displaystyle p = mp_6$ and $\displaystyle p_e = m_ep_6$

where $m$ (and $m_e$) is the same as $\bar{m}$ (and $\bar{m}_e$) except with unconditional expectations. Either way, these approximations become susceptible to absurd probabilities if not applied carefully.

Now, for the next part, let's say we solve everything exactly. If we have 4 pounds, then we could buy 2 lotteries. If we assume that we don't mix the proceeds from on with the other, then the probability of winning atleast one Jackpot is $1-(1-p)^2$ which with small value approximation gives $2p$. Similarly, With 5 pounds, the probability of winning atleast one Jackpot is $p+p_e$.

But with 6 pounds, we have a choice. We could either play three individual lotteries with winning probability $3p$ or two lottery-pounds with winning probability $2p_e$. With the values at hand, playing three individual lotteries is the clear choice.

This shows that it is better to convert all cash into lotteries to the fullest extent possible. Therefore, for integer $n$, we have,

$p_{2n}=np$ and $p_{2n+1}=(n-1)p+p_e$

All of the above were calculated using the following Mathematica code.

Clear["Global`*"];
e1 = Binomial[6, 1] Binomial[53, 5] / Binomial[59, 6];
e2 = Binomial[6, 2] Binomial[53, 4] / Binomial[59, 6];
e3 = Binomial[6, 3] Binomial[53, 3] / Binomial[59, 6];
e4 = Binomial[6, 4] Binomial[53, 2] / Binomial[59, 6];
e51 = Binomial[6, 5] Binomial[53, 1] / Binomial[59, 6] * 52 / 53;
e52 = Binomial[6, 5] Binomial[53, 1] / Binomial[59, 6] * 1 / 53;
e6 = Binomial[6, 6] Binomial[53, 0] / Binomial[59, 6];

f[p_] := e6 + e52 (1 - Power[1 - p, 25000]) + e51 (1 - Power[1 - p, 500]) + e4 (1 - Power[1 - p, 50]) + e3 (1 - Power[1 - p, 25 / 2]) + e2 p - p;
g[p1_, p2_] := {e6 + e52 (1 - Power[1 - p1, 25000]) + e51 (1 - Power[1 - p1, 500]) + e4 (1 - Power[1 - p1, 50]) + e3 (1 - (1 - p2) Power[1 - p1, 11]) + e2 p1,
e6 + e52 (1 - (1 - p2) Power[1 - p1, 24999]) + e51 (1 - (1 - p2) Power[1 - p1, 499]) + e4 (1 - (1 - p2) Power[1 - p1, 49]) + e3 (1 - Power[1 - p1, 13]) + e2 p2};

(* NSolve[p == e6 + e52 (1 - Power[1 - p, 25000]) + e51 (1 - Power[1 - p, 500]) + e4 (1 - Power[1 - p, 50]) + e3 (1 - Power[1 - p, 12]) + e2 p && 0 < p < 1, p] 
NSolve[p == e6 + e52 (1 - Power[1 - p, 25]) + e51 (1 - Power[1 - p, 5]) + e4 (1 - Power[1 - p, 3]) + e3 (1 - Power[1 - p, 2]) + e2 p && 0 < p < 1, p] *)

(* n = 1; nmax = 1000;
a = N[0, 50]; b = N[1 / 1024 / 1024, 50]; tol = Power[10, -75];
While[n <= nmax,
    c = (a + b) / 2; temp = f[c];
    If[Or[(b - a) / 2 < tol, Abs[temp] < tol tol], Break[];];
    n += 1;
    If[temp > 0, a = c;, b = c;];
];
c *)

(* With only Jackpot and extra lottery *)
p = e6 / (1 - e2);
N[p, 15]
Floor[(1 - p) / p]

(* Matts Solution *)
l = e2 + (25 / 2) e3 + 50 e4 + 500 e51 + 25000 e52;
p = e6 / (1 - l);
N[p, 20]
Floor[(1 - p) / p]

(* Ignoring Jensens *)
lb = l / (1 - e6);
p = 1 - Power[1 - e6, 1 / (1 - lb)];
N[p, 20]
Floor[(1 - p) / p]

(* FindRoot[f[p1], {p1, 1}, WorkingPrecision -> 25] *)

(* With 25 units *)
p = 2.9878253945 Power[10, -8];
N[p, 20]
Floor[(1 - p) / p]

(* With 24 units *)
p = 2.9670581679 Power[10, -8];
N[p, 20]
Floor[(1 - p) / p]

(* With 26 units *)
p = 3.00888537999 Power[10, -8];
N[p, 20]
Floor[(1 - p) / p]

(* With 25 units but with extra lottery*)
(* FindRoot[g[p1, p2] == {p1, p2}, {p1, 1}, {p2, 1}, WorkingPrecision -> 25] *)
p = 2.9675282794 Power[10, -8];
Floor[(1 - p) / p]
p = 3.0013462527 Power[10, -8];
Floor[(1 - p) / p]

(* Ignoring Jensens *)
l11 = 25000 e52 + 500 e51 + 50 e4 + 11 e3 + e2; l12 = e3;
l21 = 24999 e52 + 499 e51 + 49 e4 + 13 e3;l22 = e52 + e51 + e4 + e2;
l11b = l11 / (1 - e6); l12b = l12 / (1 - e6); l21b = l21 / (1 - e6); l22b = l22 / (1 - e6);
bdeno = (1 - l11b) (1 - l22b) - l12b l21b;
mb = (1 - l22b + l12b) / bdeno; meb = (1 - l11b + l21b) / bdeno;
p = 1 - Power[1 - e6, mb];
N[p, 15]
Floor[(1 - p) / p]
pe = 1 - Power[1 - e6, meb];
N[pe, 15]
Floor[(1 - pe) / pe]

(* Small value approximations *)
deno = (1 - l11)(1 - l22) - l12 l21;
N[e6 (1 + (l12 - l22)) / deno, 10];
N[e6 (1 + (l21 - l11)) / deno, 10];
m = (1 - l22 + l12) / deno;
me = (1 - l11 + l21) / deno;
N[e6 m, 10]
N[e6 me, 10]

(* temp[n_] := temp[n] = If[n == 2, p, pe];
Calc[n_] := Module[{res = {}, m = IntegerPartitions[n, All, {2, 3}]},
    If[n < 2, Return[0];];
    Do[
        res = Join[res, {1 - Times @@ (1 - temp /@ k)}];
    , {k, m}];
    Max[res]
];
Power[10, 10] Table[Calc[n], {n, 30}] *)


Hope you enjoyed the discussion. See you in the next post.

Yours Aye
Until then
Me