Thursday, June 29, 2023

'Leftmost' random points on Circles and Spheres

As usual, I was randomly (pun unintended) thinking about some problems in Geometric Probability and thought would share some of those.

Consider a unit circle centered at the origin. If we select $n$ points uniformly randomly in this circle, how 'left' the leftmost point be?

Mathematically, I'm looking for the expected value of the abscissa of the point with the 'minimum' x-coordinate. This way, this becomes an order statistic problem on a circle.

If we parametrize the chords perpendicular to the x-axis by the central angle 2$\theta$ they subtend, then the probability that a given point lies on the right of the chord is

$\displaystyle\mathbb{P}(\text{point lies on the right side of chord})=\frac{\theta-\sin\theta\cos\theta}{\pi}$

In other words, the above is the complementary CDF $\bar{F}(x)$ for a given value of $x=\cos\theta$.

Now the expected value of min. of $n$ i.i.d. random variables $X_1,X_2,\cdots,X_n$ each with a Complementary CDF of $\bar{F}(x)$ is given by

$\displaystyle\mathbb{E}(\text{min.})=\int x \,d(1-\bar{F}(x)^n)=x(1-\bar{F}(x)^n)-\int (1-\bar{F}(x)^n)\,dx$

The expected value of the min. abscissa is given by

$\displaystyle\mathbb{E}(\text{min. abscissa})=\int_\pi^0 \cos\theta \,d(1-\bar{F}(\cos\theta)^n)$

Therefore, in our case, we have

$\begin{align}\displaystyle\mathbb{E}(\text{min.})&=\cos 0 (1-\bar{F}(\cos 0)^n)-\cos\pi(1-\bar{F}(\cos\pi)^n)-\int_\pi^0 \left(1-\bar{F}(\cos\theta)^n\right)\,d(\cos\theta)\\ &= 1 - \int_0^\pi \sin\theta(1-\bar{F}(\cos\theta)^n)\,d\theta \\ &= \int_0^\pi \sin\theta\left(\frac{\theta-\sin\theta\cos\theta}{\pi}\right)^n\,d\theta -1 \end{align}$

From this, we can see

$\displaystyle \mathbb{E}(\text{min. abscissa among two points})=-\frac{128}{45\pi^2} \approx -0.2882$ using WA and

$\displaystyle \mathbb{E}(\text{min. abscissa among three points})=-\frac{64}{15\pi^2} \approx -0.4323$ using WA with

further values having a slightly complicated expressions. However, we can get somewhat decent approximations by noting the following graph.


Therefore,

$\displaystyle \mathbb{E}(\text{min. abscissa of }n\text{ points}) \approx \int_0^\pi \sin\theta \sin^{2n}(\theta/2)\,d\theta -1 = \frac{1-n}{1+n}$

which unfortunately only gets worse as $n$ gets larger.

A very similar approach can be used in case of a sphere as well just by noting that

$\displaystyle F(x)=\frac{1}{4}(2+\cos\theta)(1-\cos\theta)^2$

for $x=\cos\theta$ using the volume of a spherical cap.

Therefore,

$\displaystyle\mathbb{E}(\text{min. abscissa of }n\text{ points})=\int_0^\pi \sin\theta \left(\frac{(2+\cos\theta)(1-\cos\theta)^2}{4}\right)^n\,d\theta - 1=-1+\int_{-1}^1 \left(\frac{(2+x)(1-x)^2}{4}\right)^n\,dx$

Because of the polynomial integrand, the expected values in case of a sphere all turn out to be rationals. In fact by expanding the integral range (and adjusting later), we will be able to apply Laplace's method we can show that

$\displaystyle \mathbb{E}(\text{min. abscissa of }n\text{ points}) \approx \sqrt{\frac{\pi}{3n}}-1$

which get better for large $n$.

Infact using the power series of the log of integrand at its maximum of $x=-1$ and more terms for Laplace method like we did here, we can show that

$\displaystyle \int_{-2}^1 \left(\frac{(2+x)(1-x)^2}{4}\right)^n\,dx \approx \sqrt{\frac{4\pi}{3n}}\left(1-\frac{17}{72n}\right)$

which can be used to give a slightly better approximation.

More generally, by this point, it should be clear that for $d$-dimensional case, the expected value can be written as

$\displaystyle \mathbb{E}(\text{min. abscissa of }n\text{ points in }d\text{ dimensions})=\int_{-1}^1 \bar{F}_d(x)^n\,dx-1$

where $\bar{F}_d(x)$ is the ratio between the volume of $d$-ball cap to the volume of $d$-ball. By symmetry, the expected value for the 'rightmost' point will be the negative of this.

An associated question could be how far is the 'leftmost' point from the center of the $d$-ball?

Let $L_d$ denote the random variable we are interested in. In the circle case, we could construct the density function of $L_d$ by proceeding as follows:

First, any of the $n$ points could be the 'leftmost' points.

Second, the probability that the point lies at a distance $r$ from the center is $2rdr$.

Third, we note that this point will be uniformly distributed on the circle of radius $r$. The probability that the line joining the point to the origin makes an angle $\theta$ with the $x$-axis is $d\theta / \pi$ (considering only the upper half of the circle).

Finally, If we want this point to have the min. abscissa, the remaining points should have abscissa greater than $r\cos\theta$.

Therefore, the pdf of $L_2$ would be

$\displaystyle f(l_2) = n \cdot 2r dr \cdot \frac{d\theta}{\pi}\cdot \left(\frac{\cos^{-1}(r\cos\theta)-r\cos\theta\sqrt{1-r^2\cos^2\theta}}{\pi}\right)^{n-1}$

with $0 \le r \le 1$ and $0 \le \theta \le \pi$

We can verify that the integral of this pdf over the interval is indeed $1$ using WA.

We then have

$\displaystyle \mathbb{E}(L_2)=\int_0^1\int_0^\pi r\cdot n \cdot 2r \cdot \frac{1}{\pi}\cdot \left(\frac{\cos^{-1}(r\cos\theta)-r\cos\theta\sqrt{1-r^2\cos^2\theta}}{\pi}\right)^{n-1}\,d\theta\,dr$

Using WA, we can see

$\displaystyle \mathbb{E}(L_2\text{ with 2 points})=\frac{2}{3}$ and $\displaystyle \mathbb{E}(L_2\text{ with 3 points})=\frac{4200\pi^2+3360\log{2}-457}{6300\pi^2}\approx 0.69677$

We can do something similar in the Sphere case as well. The density function of $L_3$ would be

$\displaystyle f(l_3) = n \cdot 3r^2 dr \cdot \frac{2\pi r\sin\theta\cdot r d\theta}{4\pi r^2}\cdot \left(\frac{(2+r\cos\theta)(1-r\cos\theta)^2}{4}\right)^{n-1}$

using the idea that the first point will be uniformly distributed on the surface of a sphere with radius $r$ which could then be sliced into rings (as shown here) parametrized by $\theta$.

Again using WA, we can see

$\displaystyle \mathbb{E}(L_3\text{ with 2 points})=\frac{3}{4}$ and $\displaystyle \mathbb{E}(L_3\text{ with 3 points})=\frac{1719}{2240}\approx 0.7674$

Hope you enjoyed this. See ya later.

Until then
Yours Aye
Me

Saturday, June 17, 2023

Certain Inverse Sums involving Binomial Coefficients

Consider the PMF of the binomial function denoted here by $f(p,k,n)$. We know what happens to this function when it summed over $k$ or integrated over $p$ in the appropriate interval. That is,

$\displaystyle \sum_{k=0}^n\binom{n}{k}p^kq^{n-k}=1$ and $\displaystyle \int_0^1 \binom{n}{k}p^kq^{n-k} \,dp=\frac{1}{n+1}$

The first by the story of the binomial and the second by Bayes' billiard argument. But what is the sum of this PMF summed over all possible $n$? If we let $S$ denote the sum, we want

$\displaystyle S=\sum_{n=k}^\infty \binom{n}{k}p^kq^{n-k}$

This is not too hard. If we multiply the above by $p$, we can see that the summand becomes the PMF of the Negative binomial distribution and must therefore sum to $1$ by definition.  That clearly shows $S=1/p$.

We can the same for the PMF of the Hypergeometric distribution as well. That is we are interested in,

$\displaystyle S=\sum_{N=n+K-k}^\infty \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}$

I wasn't able to solve this directly but with some luck and Wolfram Alpha, I was able to guess that

$\displaystyle S=\sum_{N=n+K-k}^\infty \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}=\frac{nK}{k(k-1)}$

At about this time, I saw the following two identities in PiMuEpsilon proved using telescoping sums.

$\displaystyle \sum_{n=k}^\infty\frac{1}{\binom{n}{k}}=\frac{k}{k-1}$ and $\displaystyle \sum_{n=m}^\infty\frac{1}{\binom{n}{k}}=\frac{k}{k-1}\binom{m-1}{k-1}^{-1}$

But with these many binomials, I was sure there must be some probabilistic interpretation for the same. Am I'm gladI found we are going to solve both of these using probability in this post.

Consider a Polya Urn containing $k$ marbles of which one is white and the rest are black. Our task is to pick up one black marble from this Urn in a maximum of $m$ draws. Because this a Polya Urn, everytime we draw a ball, we replace with that ball back in the Urn along with an additional ball of the same color.

Let $I_j$ ($0\leq j<m$)be an Indicator random variable for the event of picking a black marble in the '$j+1$'th draw. Then,

$\displaystyle\mathbb{P}(I_j=1)=\frac{1}{k}\cdot\frac{2}{k+1}\cdot\frac{3}{k+2}\cdots\frac{j}{k+j-1}\cdot\frac{k-1}{k+j}=\frac{k-1}{k}\binom{k+j}{j}^{-1}$

Therefore,

$\displaystyle\mathbb{P}(\text{picking a black marble in }m\text{ draws})=\sum_{j=0}^{m-1}\mathbb{P}(I_j=1)=\frac{k-1}{k}\sum_{j=0}^{m-1}\binom{k+j}{j}^{-1}$

On the other hand, probability of picking a black marble is the complementary probability of not picking a black marble in $m$ draws.

$\displaystyle\mathbb{P}(\text{Not picking a black marble in }m\text{ draws})=\frac{1}{k}\frac{2}{k+1}\cdots\frac{m}{k+m-1}=\binom{k+m-1}{m}^{-1}$

This clearly shows that

$\displaystyle\sum_{j=0}^{m-1}\binom{k+j}{j}^{-1}=\frac{k}{k-1}\left[1-\binom{k+m-1}{m}^{-1}\right]$

The above after suitable relabelling gives,

$\displaystyle\sum_{n=k}^{k+m-1}\binom{n}{k}^{-1}=\frac{k}{k-1}\left[1-\binom{k+m-1}{k-1}^{-1}\right]$

Both the PiMuEpsilon identities given above are easy corollaries of the above identity.

We could have also started with $a$ white marbles and $b$ black marbles. In this case we would have arrived at the following result.

$\displaystyle \frac{a}{a+b}\sum_{k=0}^{c-1}\binom{b-1+k}{b-1}\binom{a+b+k}{a+b}^{-1}=1-\binom{b+c-1}{c}\binom{a+b+c-1}{c}^{-1}$

The above after some relabelling can also be written as

$\displaystyle \frac{a}{a+b}\sum_{n=b}^{c-1}\binom{n-1}{b-1}\binom{n+a}{a+b}^{-1}=1-\binom{c-1}{b-1}\binom{a+c-1}{a+b-1}^{-1}$


Hope you enjoyed this. See ya later.

Until then
Yours Aye
Me