I write, therefore I am: April 2022

Sunday, April 24, 2022

The Problem of points - Without Replacement

Consider an Urn with $A$ green balls and $B$ red balls. Two players A and B play a game where they draw balls from this Urn one at a time (without replacement). If the drawn ball is green, Player A gets a point. Else, Player B gets a point.

If Player A needs $a$ points to win the game whereas Player B needs $b$ points, the question that we address in this post concerns the probability that Player A wins the match.

As the title says, this is just the Problem of points but without replacement. We've already encountered the problem in our blog in the post titled A note on Partial fractions where we saw how Pascal and Fermat independently solved the problem with different reasoning.

Interestingly, Pascal reasoning involves the (CDF of) Neg. Binomial distribution whereas Fermat's idea involves the (CDF of) Binomial distribution. The central insight in both their ideas is in realising that we'll need a maximum of $a+b-1$ games to conclude the match.

Let's consider the original Problem for points for a moment. If we let $p$ and $q$ represent the probability of A and B winning a single game, then using Pascal's idea,

$\displaystyle \mathbb{P}(\text{A wins the match})=\sum_{k=0}^{b-1}\binom{k+a-1}{k}p^aq^k$

where the individual terms are probabilities from the Neg. Binomial distribution, $NB(a,p)$.

Using Fermat's idea,

$\displaystyle \mathbb{P}(\text{B wins the match})=\sum_{k=b}^{a+b-1}\binom{a+b-1}{k}q^kp^{a+b-1-k}$

where the individual terms are probabilities from the Binomial distribution, $B(a+b-1,p)$.

Now using our discussion about the Neg. Hypergeometric distribution in the previous case and Pascal's idea, we could solve the Without-Replacement case of the Problem of points. Or we could go for simplicity and use the Hypergeometric distribution and Fermat's idea.

Either way, let $P(A,a,B,b)$ be the probability Player A winning the game in the setup described at the start of the post and $M$ be a very large number. It should be clear the classical case can be derived out of this setup as $P(pM,a,qM,b)$ with $p+q=1$.

Obviously, when $A=B$ and $a=b$, either player has an equal chance of winning because of symmetry. Similarly, when $A=B$ and $a<b$, Player A has an advantage. Same goes when $a=b$ and $A>B$.

But consider $p(50,25,40,20)$. Player A has 10 more balls than Player B but needs 5 more points to win the game. It isn't immediately clear whether this is advantageous to Player A or not? In this case, the probability of Player A winning the match is only (approx.) 49.07%.

Naturally, we might think as we break symmetry, the match skews towards one way. Perhaps the most surprising result about the Without-Replacement case of Problem of points is the one family of parameters that form an exception.

Consider $p(21, 11, 19, 10)$. Player A has two more balls than Player B but needs one extra point to win the match. Surprisingly, this match is evenly poised. Not just this, for positive integer $n$,

$\displaystyle p(2n+1,n+1,2n-1,n)=\frac{1}{2}$

Clearly, the parameters that define the game lacks symmetry. Still the game is evenly poised between the two players. I could neither come up with a reason nor a simple explanation of why this should be the case? If you can, please post it in the comments below.

Clear["Global`*"];
pp[A_, a_, B_, b_] := (A / (A + B)) Binomial[A - 1, a - 1] Sum[Binomial[B, j] / Binomial[A + B - 1, j + a - 1], {j, 0, b - 1}];
p[A_, a_, B_, b_] := Sum[Binomial[j + a - 1, j] Binomial[A - a + B - j, B - j], {j, 0, b - 1}] / Binomial[A + B, B];
k = 2;
ListPlot[Table[p[2n + k, n + k, 2n - k, n], {n, 100}]]
p[50, 25, 40, 20]
N[%]
p[21, 11, 19, 10]
N[%]
p[26, 13, 22, 12]
N[%]

Until then
Yours Aye
Me

Saturday, April 23, 2022

Expected distance between two points inside a Sphere

This post picks up where we left earlier. As discussed before, we want to find the expected distance between two points randomly selected in an $n$-sphere.

Let the two points be $P$ and $Q$, and $O$ be the centre. The first step is to apply the technique of projective Reduction with $O$ as the scaling point. This shows that the expected length between two points inside a $n$-sphere is $2n/(2n+1)$ times the expected length between a point on the surface of $n$-sphere and another point inside the $n$-sphere.

WLOG, let $P$ be the point on the surface and let's orient the sphere so that point $P$ becomes the south pole. The next step is apply another Projective Reduction with the south pole as the scaling point. This step gives another factor of $n/(n+1)$ for the expected length. Note that while the first reduction preserves the uniformity of the points on the respective 'surfaces', it is not the case with the second reduction. This is what makes the problem tricky.

To find the density of $Q$, consider a random chord from point $P$ (the south pole) along with the axis on which $P$ lies. Let $\theta$ be the angle between the chord and the axis.

Rotating the chord along this axis cuts out a (for a lack of a better word) spherical-cone which contains a cone with its apex at the south pole and a spherical cap at the top (Imaging the other end of the chord to be on the opposite side of $P$). With simple geometry, it is also apparent that the angle between $OQ$ and the axis is $2\theta$.

If we let $V_{\text{sp. cone}}(\theta, r)$ be the volume of the spherical cone from a $n$-sphere of radius $r$ and apex angle $\theta$, then we have,

$\begin{align} \displaystyle V_{\text{sph. cone}}(\theta, r) &= V_{\text{cone}} + V_{\text{sph. cap}}\\&= \frac{1}{n}V_{n-1}(r \sin 2\theta)(r+r\cos2\theta)+\int_{0}^{2\theta}V_{n-1}(r \sin t) \,d(-r\cos t)\\ &= r^n\frac{v_{n-1}}{n}\sin^{n-1}2\theta \cdot 2 \cos^2\theta + r^nv_{n-1}\int_0^{2\theta}\sin^nt\,dt \\ &=\frac{2^nr^nv_{n-1}}{n}\sin^{n-1}\theta\cos^{n+1}\theta + + r^nv_{n-1}\int_0^{2\theta}\sin^nt\,dt \end{align}$

We have used the fact that the area of an $n$-dimensional cone is $1/n$ times the volume of the circumscribing 'cylinder'. For the volume of the cap, we have used that the volume is the sum of the areas of $(n-1)$ dimensional 'circles' with infinitesimal thickness (see here for more).

Now, $\frac{\partial V_{\text{sph. cone}}}{\partial r \partial \theta}$ gives volume between $V_{\text{sph. cone}}(\theta + d\theta, r+dr)$ and $V_{\text{sph. cone}}(\theta, r)$. When $Q$ lies in this infinitesimal volume, the distance between $P$ and $Q$ is just $2\cos\theta$.

Because we are only interested in distribution of $Q$ on the surface of the $n$-sphere, we can integrate out the $r$. Therefore, the density is given by,

$\displaystyle f(\theta)=\frac{1}{v_n}\int_{0}^1\frac{\partial V_{\text{sph. cone}}}{\partial r \partial \theta}\,dr$

We now get,

$\displaystyle f(\theta)=\frac{2^nv_{n-1}}{nv_n}((n-1)\cos^{n+2}\theta\sin^{n-2}\theta-(n+1)\sin^n\theta\cos^n\theta)+\frac{v_{n-1}}{v_n}\cdot 2\sin^n2\theta$

where we have used Leibniz differential under integral sign for the last term. Further simplification then gives,

$\displaystyle f(\theta)=\frac{2^nv_{n-1}}{nv_n}(n-1)\sin^{n-2}\theta\cos^n\theta$

This nice simplification allows us to express the expected length $e_n$ in terms of Beta function.

$\displaystyle e_n=\int_0^{\pi/2}2\cos\theta\cdot f(\theta)\,d\theta=\frac{2^n}{n}\frac{v_{n-1}}{v_n}(n-1)B(n/2-1/2, n/2+1)$

Finally, we see that the expected length $L$ between two points randomly selected inside an $n$-sphere is,

$\displaystyle \mathbb{E}(L)=\frac{2n}{2n+1}\frac{n}{n+1}\frac{2^n(n-1)}{n}\frac{B(n/2-1/2,n/2+1)}{B(n/2+1/2,1/2)}$

Interestingly, at $n\to \infty$, irrespective of whether we choose points on the surface or the inside, the expected length goes to $\sqrt{2}$. I can't clearly see why but if you have an idea, please do share it with us.

UPDATE 3 Jul 2023: Generalising this, we can see that

$\displaystyle \mathbb{E}(L^k)=\frac{2n}{2n+k}\frac{n}{n+k}\frac{2^{n+k-1}(n-1)}{n}\frac{B(n/2-1/2,n/2+k/2+1/2)}{B(n/2+1/2,1/2)}$

Using this, we can see, using WA, that

$\displaystyle \lim_{n \to \infty}\mathbb{E}(L^k)=\sqrt{2}^k$

Also, using the limit definition of the logarithm function,

$\displaystyle \mathbb{E}(\ln L)=-\frac{1}{n}+\frac{n}{2}\sum_{k=1}^\infty\frac{(-1)^{k-1}}{k(n+k)}$

This shows us the geometric mean of distance between two points chosen uniformly randomly from a unit circle is $e^{-1/4}$ and that of the sphere is $2e^{-3/4}$

Until then

Yours Aye

Expected distance between two points on the Surface of a sphere

Problems in Geometric Probability are always fascinating. In this post, we going to see two of the well known and challenging problems in this topic.

The first problem is about the expected distance between two points selected at random from a surface of an $n$-sphere. To clarify, I'm considering a circle to be a $2$-sphere, a sphere to be a $3$-sphere and so on.

This problem is relatively easy as can seen from this stackexchange post. Let $S_n(R)$ and $V_n(R)$ denote the surface area and the volume of an $n$-sphere with radius $R$. We know that,

$S_n(R)=s_nR^{n-1}$ and $V_n(R)=v_nR^n$

where $s_n=\dfrac{2\pi^{n/2}}{\Gamma(n/2)}$ and $v_n=\dfrac{2}{n}\dfrac{\pi^{n/2}}{\Gamma(n/2)}$

For the problem under consideration, we can choose one of the points to be on the south pole of the $n$-sphere. Consider one of the axis that connects this point to the origin. A hyperplane perpendicular to this axis cuts the surface of the $n$-sphere to give a $(n-1)$-spherical surface.

If we consider multiple such hyperplanes parallel to one another along the axis, the $n$-spherical surface is sliced into multiple rings. The picture below shows this in case of a $3$-sphere (taken from this Stack Exchange post).

Because second point is uniformly distributed in the $n$-spherical surface, it is equally likely to be present in any of these rings.

Let $\theta$ be the angle subtended between the radius that contains the second point and radius containing the first point. Then the distance between the points is $2\sin(\theta/2)$.

If we consider these rings to be infinitesimally small, then the density function of the second point lying in any of these rings is equal to the area of the ring. But this is easy. This is just the surface area of the sliced $(n-1)$-hypersphere times $rd\theta$ (obviously $r=1$ for the unit sphere; used here for illustrative purposes). Therefore,

$\displaystyle \mathbb{E}(L)=\frac{1}{s_n}\int_0^\pi 2\sin(\theta/2)\cdot S_{n-1}(r\sin\theta)\cdot r\,d\theta=\frac{s_{n-1}}{s_n}\int_0^\pi 2\sin(\theta/2)\cdot \sin^{n-2}\theta \,d\theta$

Using Wolfram Alpha, we then see that

$\displaystyle \mathbb{E}(L)=\frac{\Gamma(n/2)}{\sqrt{\pi}\Gamma((n-1)/2)}\cdot \frac{2\sqrt{\pi}\Gamma(n-1)}{\Gamma(n-1/2)}=2\frac{\Gamma(n/2)}{\Gamma((n-1)/2)}\cdot \frac{\Gamma(n-1)}{\Gamma(n-1/2)}$

Now, the second problem asks for the expected distance between a point randomly selected on the $n$-spherical surface and a point selected on the $n$-spherical volume. This problem is more involved and requires a little more trickery. We'll see this in the next post.

Until then

Yours Aye