I write, therefore I am: 2021

Wednesday, November 10, 2021

Some more problems on a Deck of cards

We already saw how a standard deck of cards forms a nice setting for studying both Coupon Collector problem without replacement and Birthday problem without replacement. Here we extend those to some more problems for a fun!

The first problem is to ask for the expected number of cards such that we have atleast one matching suit or one matching rank. This problem is relatively easy. We proceed using the same idea as before.

Let $X$ be the random variable denoting the number of cards need to get atleast one matching suit or rank. Then, the probability that we need more than $k$ cards to get this,

$\displaystyle \mathbb{P}(X>k)=\frac{k!\binom{13}{k}\binom{4}{k}}{\binom{52}{k}}$

That is, we are finding the probability that we neither have a mathcing suit nor a matching rank after we have drawn $k$ cards without replacement. Therefore,

$\displaystyle \mathbb{E}(X)=\sum_{k>0}\mathbb{P}(X>k)$

So, we need, on average, $\approx 3.0799$ cards to get either a matching suit or a matching rank when drawing cards without replacement from a randomly shuffled standard deck of cards.

To simplify text, let $S_k$ denote the event of having atleast one matching suit after having drawn $k$ cards without replacement and $R_k$ denote the same event for rank. Then, we know $\mathbb{P}(S^c_k)$, $\mathbb{P}(R^c_k)$, and $\mathbb{P}(S^c_k\text{ and }R^c_k)$.

Using Inclusion-Exclusion principle, we have

$\displaystyle \mathbb{P}(S^c_k \text{ or } R^c_k)=\mathbb{P}(S^c_k)+\mathbb{P}(R^c_k)-\mathbb{P}(S^c_k\text{ and }R^c_k)$

This allows to calculate the expected number of cards need to get both a matching suit and a matching rank when drawing cards without replacement. If $X$ denotes this random variable, then

$\mathbb{P}(X>k)=1-\mathbb{P}(S_k\text{ and }R_k)=\mathbb{P}(S^c_k \text{ or } R^c_k)$

This shows that

$\displaystyle \mathbb{E}(S \text{ and } R)=\mathbb{E}(S)+\mathbb{E}(R)-\mathbb{E}(S\text{ or }R)\approx 3.2678 + 5.6965 - 3.0799=5.8844$

Therefore, we need, on average, $\approx 5.8844$ cards to get both a matching suit and a matching rank when drawing cards without replacement from a randomly shuffled standard deck of cards.

Now we ask a similar question in the Coupon collector problem. That is, we seek the expected number of cards needed to collect all suits and all ranks when drawing cards without replacement from a well shuffled pack of cards.

Like we did before, we will be using idea of negative Geometric random variable and the Min-Max identity. Let $X_k$ ($k \in \{\text{Spades}, \text{Clubs}, \text{Hearts}, \text{Diamonds}\}$)be the random number of draws needed without replacement to get first suit. Similarly, let $Y_k$ for $1\leq k \leq 13$ be the corresponding random variables for ranks.

If we let $Z$ denote the random number of draws needed to collect all suits and ranks, and $A$ be the set $\{X_S, X_C,\cdots,Y_1,Y_2,\cdots,Y_{13}\}$, then by the Min-Max identity, we know

$\displaystyle \mathbb{E}(Z)=\mathbb{E}(\text{max }A)=\sum_{M \subset_\emptyset A}(-1)^{|M|}\mathbb{E}(\text{min }M)$

where we have used $\subset_\emptyset$ to denote a non-empty subset.

It just remains to calculate the expected values of the subsets. This is not very difficult and we can proceed as we did in the earlier post. For example, let's solve for $M=\{X_S,X_C,Y_5,Y_6,Y_7\}$.

Then $\mathbb{E}(\text{min }M)$ is the expected number of cards needed to get either a Spade or a Clubs or any of ranks $5$, $6$ or $7$. This is again a Negative Geometric variable with population size $N=52$ and "successes" $K=13+13+4+4+4-6=32$. We are subtracting the six cards which are the 5, 6 and 7 of Clubs and Spades to avoid double counting.

Now using the expectation of Negative Geometric variable, we easily see that $\mathbb{E}(\text{min }M)=(52+1)/(32+1)$.

Expanding the idea, the expected value of $Z$ can be seen to be

$\begin{align}\displaystyle \mathbb{E}(Z)&=(52+1)\left[1-\sum_{i=0}^4\sum_{j=0}^{13}\binom{4}{i}\binom{13}{j}\frac{(-1)^{i+j}}{13i+4j-ij+1}\right]\\ &=(52+1)\left[1-(-1)^{4+13}\sum_{i=0}^4\sum_{j=0}^{13}\binom{4}{i}\binom{13}{j}\frac{(-1)^{i+j}}{52-ij+1}\right]\end{align}$

Evaluating this with Wolfram Alpha, we finally see that the number of cards needed, on average, to collect atleast one card from each suit and rank when drawing cards without replacement from a shuffled deck of cards is $\approx 27.9998$, very close to an integer.

Until then
Yours Aye
Me

Tuesday, November 9, 2021

Estimating the height of a Mountain (or a Building)

I recently watched Matt Parker's attempt to measure the Earth's radius with Hannah Fry. As always, Matt made it entertaining but there was something missing there which I wanted to share in this post.

I remember seeing the idea used in the video in History of Geodesy and the related idea of finding the height of a mountain by ancient mathematicians. The idea that I find the most problematic in the approach used in the video is the use of sextant in finding the tower's height.

In my opinion, parallax error induced in hand made sextants is difficult to control and can completely throw off estimates. While we can't do away with the sextant in finding the Earth's circumference, we certainly can do better in the other case.

All we need now is a pole of known length and two measurements using that pole as shown in the figure above. That is we first place the pole at some point and move ourselves into a position such that the top of the pole and the top of the coincide in our line of sight. Now repeat the same at a different point.

Let $OH$ be the height we are tying to measure. Let $AC=BD$ be the pole whose length we consider unity to simplify calculations. Now, ssing the similarity of triangles $OA'H$ and $OB'H$, we have

$$\displaystyle \frac{OH}{AC}=\frac{OB'}{BB'}=\frac{OA'}{AA'}=\frac{B'A'}{BB'-AA'}=\frac{B'A'}{B'A'-BA}=\frac{AB+BB'-AA'}{BB'-AA'}$$

which is an expression of length measurements. Not only do we avoid sextants here, we also don't have to worry about measuring lengths from the mountain's (or building's) base.

Note that $A'$ and $B'$ are the only measurement in the expression which is prone to error because of the parallax. If we let $OH=y$ and $BB'-AA'=z$, then we can rewrite the above as

$$OH=y=AC\cdot \left(1+\frac{AB}{z}\right)$$

Differentiating the above expression, we see that

$$\displaystyle \frac{dy}{dz}=-AC\cdot \frac{AB}{z^2}$$

This shows that if $z$ is too small (which means $AA'$ and $BB'$ are almost equal), then even a small change in $z$ would result in a large change in the height. So the best thing is to make the first measurement close enough to the mountain and the second measurement far so as to reduce this effect.

Because we rely on line of sight in finding both $A'$ and $B'$, we are interested in some sort of interval estimate for the height. To this end, we consider any measurements involving points $A'$ and/or $B'$ are independent Normal variables with variance $\sigma^2$. Now,

$$\displaystyle OH=AC \cdot \left(1+\frac{AB}{Z}\right)$$

where $Z =Z_{BB'}-Z_{AA'}\sim \mathcal{N}(BB'-AA',2\sigma^2)$.

Now to attain a confidence level of $\alpha$, we have

$\displaystyle \mathbb{P}(OH^{-} \leq OH \leq OH^{+})=\alpha$ where $\displaystyle OH^{\pm}=AC\cdot\left(1+\frac{AB}{BB'-AA' \mp z_\alpha \sqrt{2}\sigma}\right)$

Note that $z=BB'-AA'$ needs two measurements. We could alternately use a single measurement $A'B'$ which also is random with variance $2\sigma^2$. Practically though, as the two measuring points are far away, $A'B'$ will be long. In this case, we have,

$\displaystyle \mathbb{P}(OH^{-} \leq OH \leq OH^{+})=\alpha$ where $\displaystyle OH^{\pm}=AC\cdot\left(1-\frac{BA}{B'A' \mp z_\alpha \sqrt{2}\sigma}\right)^{-1}$

I plan to use this idea to estimate the height of a temple tower near my place. I'll update if I find anything.

Until then

Yours Aye

Saturday, September 4, 2021

Animating the Spherical Tautochrone and Isochrone

I recently made a submission to 3Blue1Brown's Summer of Math Exposition based on the idea of Spherical Tautochrones. It's a short clip and if interested you can watch the video on my channel.

While doing the video, I wanted to know exact location of the video at any given time to make the animation scientifically correct. But rather than make this about the spherical case, let's make it more general. Then, using the Conservation of energy we first have,

$\displaystyle\frac{ds}{dt}=\sqrt{2g(y_0-y)}$

where $y_0$ denotes the y-coordinate of the sliding particle at time $t=0$.

Using the condition we derived for the Tautochrone curve in the video,

$\displaystyle \frac{ds}{dy}=\frac{T_0}{\pi}\sqrt{\frac{2g}{y}}$

Using the above two equations, we get

$\displaystyle \frac{dy}{dt}=\frac{\pi}{T_0}\sqrt{y(y_0-y)}$

In my python code for the video, I used python's odeint to numerically solve this equation. But thinking about it, there was a much more easier way to do this.

Using the pendulum analogy we saw in the video, we can multiply the 'force equation' of the pendulum by $l$, the length of the pendulum to get

$\displaystyle \frac{d^2s}{dt^2}+\frac{g}{l}s=0$

which is just the equation of the Simple Harmonic motion as noted in the video. Solving this for the initial condition that $s=s_0$ at $t=0$ and $s=0$ at $t=T_0$, we easily see that

$\displaystyle s=s_0\cos\left(\frac{\pi}{2}\frac{t}{T_0}\right)$

where we have used the expression $T_0=(\pi/2)\sqrt{l/g}$ to eliminate the $g/l$ term.

Also from the condition that we derived for tautochrones, we know that $s$ is directly proportional to $\sqrt{y}$. This immediately gives

$\displaystyle y=y_0\cos^2\left(\frac{\pi}{2}\frac{t}{T_0}\right)$

which gives the position of the particle at time any time $t$. For the spherical co-ordinate, this reduces to

$\displaystyle \cos\left(\frac{\theta}{2}\right)=\cos\left(\frac{\theta_0}{2}\right)\cos\left(\frac{\pi}{2}\frac{t}{T_0}\right)$

This means, we could have simply used this explicit expression rather than numerically solving a differential equation in the animation!!

Suddenly then, I realized that this will not work when animating the (polar) Isochrone. But then it dawned on me that the polar velocity is constant by definition of the polar isochrone. Great!!

Until then

Yours always

Wednesday, August 18, 2021

Birds on a Wire - a Probability puzzle

This post is about birds on a wire - a nice probability puzzle that I recently saw on cut-the-knot website. The site also gives five solutions all of which are somewhat informal.

While all the solutions were pretty, I was particularly struck by the elegant solution by Stuart Anderson. Studying his solution made me realize that the condition that '$n$ is large' is not very significant. In fact, we'll use his idea to solve the question for the finite version of the problem.

Let the length of the wire be $1$ unit. Let $X_k$ ($1\leq k \leq n$) denote the position of the $k$th bird to sit on the wire. That is, all the $X_k$'s are independent standard uniform random variables.

Let $X_{(k)}$ be the random variable denoting the $k$th Order statistic. Now let's define new random variables $Y_k$ for $0 < k \leq n+1$ such that

$Y_k=X_{(k)}-X_{(k-1)}$ with $X_{(0)}=0$ and $X_{(n+1)}=1$.

Note that the $n$ birds divide the wire into $n+1$ segments and the random variable $Y_k$ denote the length of the $k$th segment.

Using the joint distribution of the Order statistics of the Uniform distribution, we can see that the $Y_k$s are identical (but not necessarily independent) $Beta(1,n)$ variables.

Therefore, the cumulative distribution for any $Y_k$ is $F(y)=\mathbb{P}(Y_k \leq y)=1-(1-y)^n$ and the expected value is $\mathbb{E}(Y_k)=1/(n+1)$.

Now if $n$ were large, we can use the idea that the above expression for CDF is approximated by $1-e^{-ny}$ and the expected value by $1/n$. But as noted earlier, we don't have to assume $n$ is large and proceed directly.

Let's define new variables $C_k$ such that

$\displaystyle C_k=\begin{cases}0, & \text{if }k \text{th segment is not colored}\\ Y_k, & \text{otherwise} \end{cases}$

It is straight forward to see that

(i) $\mathbb{E}(C_1)=\mathbb{E}(C_{n+1})=0$ because those segments never get coloured and

(ii) $\mathbb{E}(C_2)=\mathbb{E}(Y_2)=\mathbb{E}(C_n)=\mathbb{E}(Y_n)=1/(n+1)$ because those segments always gets coloured.

For the cases $2<k<n$, consider the segment $Y_k$ along with its neighbours $Y_{k-1}$ and $Y_{k+1}$. A minute's thought shows that the $k$th segment is not coloured if and only if it is the biggest among the three variables.

If we let $Z_k=\max(Y_{k-1},Y_k,Y_{k+1})$, we can rewrite $C_k$ as

$\displaystyle C_k=\begin{cases}0, & Y_k=Z_k\\ Y_k, & \text{otherwise} \end{cases}$

But $\displaystyle \mathbb{P}(Y_k=Z_k)=\frac{1}{3}$ by symmetry. Therefore,

$\displaystyle \mathbb{E}(C_k)=\mathbb{P}(Y_k=Z_k)\cdot 0+\mathbb{P}(Y_k \ne Z_k) \mathbb{E}(Y_k|Y_k \ne Z_k)=\frac{2}{3}\mathbb{E}(Y_k|Y_k \ne Z_k)$

To do this, we condition on $Y_k$. This gives,

$\displaystyle \mathbb{E}(Y_k)=\frac{1}{3}\mathbb{E}(Y_k|Y_k=Z_k)+\frac{2}{3}\mathbb{E}(Y_k|Y_k \ne Z_k)$

Rearranging this, we have

$\displaystyle \mathbb{E}(C_k)=\frac{2}{3}\mathbb{E}(Y_k|Y_k \ne Z_k)=\mathbb{E}(Y_k)-\frac{1}{3}\mathbb{E}(Y_k|Y_k=Z_k)=\mathbb{E}(Y_k)-\frac{1}{3}\mathbb{E}(Z_k)$

To compute the expected value of $Z_k$, we proceed in two parts.

First, consider any distribution with CDF $F(v)$ and the order statistics $V_{(i)}$, $V_{(j)}$ and $V_{(k)}$ with $i<j<k$.

It can be seen intuitively that the conditional distribution of $V_{(j)}$ given $V_{(i)}$ and $V_{(k)}$ is the same as the $(j-i)$th order statistic obtained from a sample of size $k-i-1$ whose distribution is $F(v)$ truncated on the left at $V_{(i)}$ and on the right at $V_{(k)}$.

For example, if we know that $V_{(5)}=0.4$ and $V_{(10)}=0.9$ from the standard uniform distribution $U(0,1)$, then $V_{(7)}$ has the same distribution as $W_{(2)}$ from a sample of size $4(=10-5+1)$ from the distribution $W\sim U(0.4,0.9)$.

This result can be obtained by using Theorems 2 and 3 given here.

Second, it is a well known result in probability (cuttheknot, here and here to quote a few) that if a line of length 1 unit is broken at $n-1$ points giving $n$ segments, then the expected length of the $k$th largest segment is $(1/k+\cdots+1/n)/n$. Specifically, the length of the largest segment in a 3 segment case is $11/18$.

Therefore,

$\displaystyle \mathbb{E}(Z_k)=\mathbb{E}(\mathbb{E}(Z_k|X_{(k-2)},X_{(k+1)}))=\mathbb{E}\left(\frac{11}{18}(X_{(k+1)}-X_{(k-2)})\right)=\frac{11}{18}\frac{3}{n+1}$

Using the above and substituting the known values, we get

$\displaystyle \mathbb{E}(C_k)=\frac{1}{n+1}-\frac{1}{3}\frac{11}{18}\frac{3}{n+1}=\frac{7}{18n+18}$

If we now let $C=C_1+C_2+\cdots+C_n+C_{n+1}$, then using the linearity of expectations, we get

$\displaystyle \mathbb{E}(C)=\frac{2}{n+1}+(n-3)\frac{7}{18n+18} =\frac{7n+15}{18n+18}$

Note that this applies only when $n\geq 3$ (cases $n<3$ are trivial to calculate). More importantly, this shows that the expected value tends to $7/18$ for large $n$.

I would like to thank Abhilash Unnikrishan for pointing out the error (I assumed that $Y_k$s are independent) in my original solution and to use the 'expected value of largest segment in a line' to simplify calculations in arrive at the final result.

Hope you enjoyed the discussion.

Until then

Yours Aye

Sunday, July 11, 2021

Doubling the Cube

It's always interesting anytime we get to visit one of the three classical problems in Geometry. As the title of this post says, we are going to talk about Doubling the cube.

It is well known that doubling the cube is impossible with only a compass and straightedge but is tractable with a neusis ruler. One of the simplest such constructions is given Wikipedia. We use a slightly modified version in this post.

Construct an equilateral triangle $ABC$. Draw a line through $B$ perpendicular to $BC$. We now use the neusis to find point $D$ (on line $AB$) and $E$ such that $AB=ED$. Then, $CE=\sqrt[3]{2}AB$.

The wiki article quoted above does not prove this and I decided to try it on my own. To begin with, because $\angle CBE$ is right angled, we see that $BE=\sqrt{CE^2-CB^2}$.

Now we construct a point $E'$ on $BD$ such that $EE'$ is parallel to $CB$.

Because $\triangle DEE' \sim \triangle DCB$, their sides are in proportion. Therefore,

$\displaystyle \frac{EE'}{CB} = \frac{DE}{DC} \implies EE'=BC \cdot \frac{BC}{BC+CE}$.

As $\triangle BE'E$ is a $30-60-90$ triangle, $BE=\sqrt{3}EE'$. Therefore,

$\displaystyle \sqrt{3}\frac{BC^2}{BC+CE}=\sqrt{CE^2-BC^2} \implies (CE^2-BC^2)(CE+BC)^2=3BC^4$.

If we let $CE=xBC$, then the above expression reduces to $(x^2-1)(x+1)^2=3$.

By ratio test, we can guess that $-2$ is a root of the equation. Taking that factor out, we can see that $x=\sqrt[3]{2}$ thus proving that $CE=\sqrt[3]{2}BC$.

This last part of the proof is the one I find a little unsatisfactory. For one of the classical problems in Geometry, this proof is more algebraic than geometric.

I went in search of a more geometric proof myself. I couldn't find one and I went on an internet search and to my surprise, there aren't a lot. Finally, I found Nicomedes' proof in this page which gave me what I was looking for. I also get to know about Crockett Johnson's painting on Doubling the cube.

Luckily, I was able to modify (and simplify) the argument to get something that looks elegant to my eyes. This proof starts by constructing a line through $D$ perpendicular to $CB$. Let this line meet (the extension of) $CB$ at $J$.

As $\triangle CBE \sim \triangle CJD$,

$\displaystyle \frac{CB}{BJ}=\frac{CE}{ED} \implies CE \cdot BJ=CB \cdot ED=CB \cdot AB$

Note that $\triangle DBJ$ is a $30-60-90$ triangle. Therefore, $BD=2BJ$. This finally shows us that

$CE \cdot BD=2CB \cdot AB \tag{1} \label{reqn}$

Now comes the second part of our construction. Extend $AC$ and mark a point $F$ on it such that $CF=CD$. Join $D$ and $F$. Draw a line through $B$ perpendicular to $AD$ and mark its intersection with $AF$ as $G$. Draw a line through $G$ parallel to $AB$ and let it meet $DF$ at $H$. Join $H$ and $B$.

$\triangle GAB$ is a $30-60-90$ triangle. Therefore, $GA=2AB=2CB$.

Because $CD=CF$ by construction, $CE=GF$. Using this, we recast $\eqref{reqn}$ to

$GF \cdot BD=GA \cdot AB \tag{2} \label{rreqn}$

Our plan now is to prove that $GH=AB$. As $\triangle FGH \sim \triangle FAD$, we have

$\displaystyle \frac{GH}{GF}=\frac{AD}{AF} \implies GH \cdot AF=GF \cdot ( AB+BD)=GF\cdot AB+GF\cdot BD$

Using $\eqref{rreqn}$ now, we have,

$\displaystyle GH \cdot AF=GF\cdot AB+GA\cdot AB=AB\cdot (GF+GA)=AB \cdot AF \implies GH = AF$

This also shows that $BH$ is parallel to $AF$. Therefore,

$\displaystyle\frac{GH}{GF}=\frac{AD}{AF}=\frac{BD}{BH} \tag{3} \label{feqn}$

because $\triangle FGH \sim \triangle FAD \sim \triangle HBD$.

We now construct a point $C'$ on $AB$ such that $CC'$ is perpendicular to $AB$. Note that $C'$ will also be the midpoint of $AB$. Now,

$DB\cdot DA=(DC'-C'B)(DC'+C'B)=DC'^2-C'B^2$

Now $CC'^2=CD^2-C'D^2=CB^2-C'B^2$ both by Pythagoras theorem on respective $\triangle CC'D$ and $\triangle CC'B$. Using this,

$DB \cdot DA=CD^2-CB^2=CF^2-CA^2=(CF-CA)(CF+CA)=GF\cdot AF$

Therefore,

$\displaystyle \frac{GF}{DB}=\frac{DA}{AF}$

(Alternately, we can construct a circle centered at $C$ with $CA$ as radius. Because both $D$ and $F$ are equidistant from $C$, tangents from $D$ and $F$ to the circle will be of the same length. In other words, both points have the same power w.r.t the circle. But it is apparent from the configuration that $\mathcal{P}(C,D)=DB\cdot DA$ and $\mathcal{P}(C,F)=FG \cdot FA$)

Using $\eqref{feqn}$, we finally have $\displaystyle \frac{GH}{GF}=\frac{GF}{BD}=\frac{BD}{BH}$

Because $ABHG$ is a parallelogram, $BH=AG=2GH$. Therefore,

$\displaystyle \left(\frac{GF}{GH}\right)^3=\frac{GF}{GH}\frac{GF}{GH}\frac{GF}{GH}=\frac{GF}{GH}\frac{BD}{GF}\frac{BH}{BD}=\frac{BH}{GH}=2$

Using the fact that $GF=CE$ and $GH=AB$, we finally have

$CE^3=2AB^3$ (or) $CE=\sqrt[3]{2}AB$

Until then

Yours Aye

Thursday, May 20, 2021

Birthday Problem without Replacement

In one of the previous posts, we saw the generalization of Coupon collector's problem without replacement. Not long after, I thought about the same generalization for the Birthday problem.

Because a Deck of cards is a better platform for dealing with problems without replacement, we use the same setup here. That is, we seek the expected number cards we need to draw (without replacement) from a well shuffled pack of cards to get two cards of the same rank.

This problem is a lot easier. We have $n=13$ ranks each with $j=4$ instances (suits). Using Hypergeometric distribution, the probability that we need more than $k$ cards

$\displaystyle \mathbb{P}(X>k)=\frac{\binom{13}{k}\cdot 4^k}{\binom{52}{k}}$

That is, of the $k$ cards we select from $52$ cards, we can have atmost one instance for each rank. Therefore, the $k$ cards should come from the $13$ ranks and for each of the card, we have $4$ choices.

Now the expected value is obtained easily.

$\displaystyle \mathbb{E}(X)=\sum_{k=0}^n \mathbb{P}(X>k)=\sum_{k=0}^n \frac{\binom{n}{k}\cdot j^k}{\binom{nj}{k}}=1+\sum_{k=1}^nj^k\binom{n}{k}\binom{nj}{k}^{-1}$

Needless to say, this agrees with the classical expression (that is at $j \to \infty$) given in Pg 417 (or Pg 433) of 'Analytic Combinatorics'. We attempt to find asymptotics when $n\to\infty$.

Let $\displaystyle a_k=j^k\binom{n}{k}\binom{nj}{k}^{-1}$ for $k\geq 1$

Then, using the beautiful idea I learned from Hirschhorn's page (for example, papers 176 and 197),

$\begin{align}\displaystyle a_k &= j^k\binom{n}{k}\binom{nj}{k}^{-1}\\ &= \frac{n}{nj}\frac{n-1}{nj-1}\cdots\frac{n-k+1}{nj-k+1}j^k \\ &= \frac{1-\frac{0}{n}}{1-\frac{0}{nj}}\frac{1-\frac{1}{n}}{1-\frac{1}{nj}}\cdots \frac{1-\frac{k-1}{n}}{1-\frac{k-1}{nj}} \\ &\approx \text{exp}\left\{-\frac{0}{n}-\frac{1}{n}-\cdots -\frac{k-1}{n}+\frac{0}{nj}+\frac{1}{nj}+\cdots +\frac{k-1}{nj}\right\} \\ &\approx \text{exp}\left\{-\frac{1-j^{-1}}{n}\frac{k^2}{2}\right\} \\ \end{align}$

This is nothing but Laplace's method widely discussed in literature and especially in my favorite book 'Analytic Combinatorics' (page 755).

Note that we have used idea that $(1+x)^m=e^{m\log(1+x)}\approx e^{mx}$ for small $x$. Now,

$\displaystyle \mathbb{E}(X)=1+\sum_{k=1}^n a_k \approx 1+\int\limits_0^\infty \text{exp}\left\{-\frac{1-j^{-1}}{n}\frac{k^2}{2}\right\}\,dk=1+\sqrt{\frac{\pi}{2}}\sqrt{\frac{n}{1-j^{-1}}}$

where we've used the standard Normal integral. For large $j$, this clearly reduces to the classical asymptotic value.

In fact, for large $n$, the asymptotic expansion of the binomial coefficient is

$\begin{align}\displaystyle \binom{n}{k} & \approx \frac{n^k}{k!}\text{exp}\left\{-\frac{S_1(k-1)}{n}-\frac{S_2(k-1)}{2n^2}-\frac{S_3(k-1)}{3n^3}\cdots\right\} \\ & \approx \frac{n^k}{k!}\text{exp}\left\{-\frac{k(k-1)}{2n}-\frac{k^3}{6n^2}-\frac{k^4}{12n^3}\cdots\right\}\\ & \approx \frac{n^k}{k!}\text{exp}\left\{-\frac{k^2}{2n}-\frac{k^3}{6n^2}-\frac{k^4}{12n^3}\cdots\right\}\\ \end{align}$

where $S_r(m)$ is the sum of the $r$-th powers of the first $m$ natural numbers.

Using the second expression and simplifying a bit more, we have

$\displaystyle a_k \approx \text{exp}\left\{\frac{1-j^{-1}}{8n}\right\}\text{exp}\left\{-\frac{(1-j^{-1})(k-\frac{1}{2})^2}{2n}\right\}\left(1-\frac{1-j^{-2}}{6n^2}k^3-\frac{1-j^{-3}}{12n^3}k^4+\frac{1}{2}\frac{(1-j^{-2})^2}{36n^4}k^6+\cdots\right)$

If we now use the Normal approximation, we get

$\begin{align}\displaystyle \mathbb{E}(X) &\approx 1+\text{exp}\left\{\frac{1-j^{-1}}{8n}\right\}\left(\sqrt{\frac{n\pi/2}{1-j^{-1}}}-\frac{1}{3}\frac{j+1}{j-1}-\frac{1}{4}\frac{1-j^{-3}}{(1-j^{-1})^{5/2}}\sqrt{\frac{\pi}{2n}}+\frac{5}{24}\frac{(1-j^{-2})^2}{(1-j^{-1})^{7/2}}\sqrt{\frac{\pi}{2n}}\right) \\ &= 1+\text{exp}\left\{\frac{1-j^{-1}}{8n}\right\}\left(\sqrt{\frac{n\pi/2}{1-j^{-1}}}-\frac{1}{3}\frac{j+1}{j-1}-\frac{1}{24}\frac{1-4j^{-1}+j^{-2}}{(1-j^{-1})^2}\sqrt{\frac{1-j^{-1}}{2n/\pi}} \right) \\ &\approx 1+\sqrt{\frac{n\pi/2}{1-j^{-1}}}-\frac{1}{3}\frac{j+1}{j-1}+\frac{1}{12}\frac{1-j^{-1}+j^{-2}}{(1-j^{-1})^2}\sqrt{\frac{1-j^{-1}}{2n/\pi}} \\ \end{align}$

which I think is an $O(n^{-1})$ approximation. Hope you liked the discussion.

Clear["Global`*"];

n = 10000; j = 7;

acc = 50;

res = N[1, acc];

k = 1; nume = N[n, acc]; deno = N[n, acc];

Monitor[Do[

res += N[nume/deno, acc];

nume *= (n - k); deno *= (n - k/j);

, {k, n}];, {k, res}]; // AbsoluteTiming

res

N[1 + Exp[(1 - j^-1)/(

8 n)] (Sqrt[(n \[Pi]/2)/(1 - j^-1)] - 1/3 (1 + j^-1)/(1 - j^-1) -

1/24 (1 - 4 j^-1 + j^-2)/(1 - j^-1)^2 Sqrt[(1 - j^-1)/(

2 n/\[Pi])]), acc]

Until then

Yours Aye

Friday, May 14, 2021

Coupon collector's Problem without Replacement

In this post, we seek the expected number of coupons needed to complete the set where the number of coupons in each type is finite and we sample without replacement.

I recently got interested in this question when I saw a post on Reddit that asks for the expected number of cards that has to be drawn without replacement from deck of cards to get all the four suits. This is essentially the coupon collector's problem with 4 different coupon types with 13 coupons available in each type.

Let's first discuss a simpler question. What is the expected number of cards to be drawn without replacement from a well shuffled deck to collect the first Ace? This is exactly like that of the Geometric distribution but without replacement.

We can employ symmetry to simplify this problem. The four Aces would divide the remaining 48 cards into five equal sets. Therefore each set would contain 48/5=9.6 cards. Thus, including the first Ace we collect, we need 10.6 draws without replacement.

What we've discussed here is the Negative Hypergeometric distribution which deals with the expected number of draws without replacement needed to get $r$-th success. Analogous to the Geometric distribution, let's also define the Negative Hypergeometric distribution with $r=1$ as the Negative Geometric distribution.

If $X$ is a Negative Geometric random variable denoting the number of draws without replacement needed to get the first success, then based on our discussion above, we have

$\displaystyle\mathbb{E}(X)=\frac{N-K}{K+1}+1=\frac{N+1}{K+1}$

where $N$ is the population size and $K$ is the number of "successes" in the population.

That is all we need to solve our original problem. Even though, finding the expected number of draws without replacement to get all suits is solved a lot of times in Stack Exchange (For example, here, here and here by Marko Reidel with Generating functions), we are going to use the amazing Maximums-Minimums Identity approach used here.

Let $X_1$, $X_2$, $X_3$ and $X_4$ be the random number of draws without replacement needed to get the first Spades, Clubs, Hearts and Diamonds respectively. Then the random number $X=\text{max}(X_1,X_2,X_3,X_4)$ denotes the number of draws to get all the four suits.

Note that each $X_i$ is a Negative geometric variable and the minimum of any bunch is again Negative Geometric with the number of successes pooled together. Using the Max-Min Identity and the linearity of expectations, we have

$\begin{align}\displaystyle\mathbb{E}[X]&=\mathbb{E}[\text{max }X_i]\\ &=\sum_i \mathbb{E}[X_i] - \sum_{i<j}\mathbb{E}[\text{min}(X_i,X_j)]+\sum_{i<j<k}\mathbb{E}[\text{min}(X_i,X_j,X_k)]-\cdots\\ &= \binom{4}{1}\frac{52+1}{13+1}-\binom{4}{2}\frac{52+1}{26+1}+\binom{4}{3}\frac{52+1}{39+1}-\binom{4}{4}\frac{52+1}{52+1}\\ &= \frac{4829}{630} \approx 7.66508\end{align}$

Though we have solved the question in the case where each type has an equal number of coupons, it should be easy to see that this approach is generalizable easily.

For the case of $n$ different coupon types with $j$ coupons in each type, we have,

$\displaystyle \mathbb{E}(X)=\sum_{k=1}^n(-1)^{k-1}\binom{n}{k}\frac{nj+1}{kj+1}=(nj+1)\left[1-\binom{n+1/j}{n}^{-1}\right]$

which is the closed form solution of the finite version of the coupon collector's problem. We know that as $j\to \infty$, this reduces to the classical coupon collector problem. For case of $n\to \infty$, using the Asymptotic bounds of Binomial Coefficient, we can write,

$\displaystyle \mathbb{E}(X) \approx nj \left[1-\binom{n+1/j}{n}^{-1} \right] \approx nj - \Gamma(1/j)n^{1-1/j}$

Even though I knew how the Max-Min Identity is useful in terms of the coupon collector problem with replacement, it took me almost a day of on-and-off thinking before I could convince myself that we can make it work for the without replacement case as well. And it was a pleasant surprise that we could arrive at a nice asymptotic expression for the expectation.

Until Then

Yours Aye

Friday, April 30, 2021

A Probability problem on an Election result

A friend of mine recently posed the following problem to me: Given two contestants in an election $X$ and $Y$ (with equal popularity among the voters), what is the probability that the contestant leading the election after 80% of polling eventually loses the election?

I misunderstood the question in ways more than one and, not wanting to use paper-pencil, solved this problem with an answer of $(2/\pi)\tan^{-1}4$ which was wrong.

I enquired with him the source of the problem which also has a solution. But the solution there seems very convoluted and needs Wolfram Alpha to get a closed form. Seeing the solution, I realized that I have misunderstood the question and that my method is a lot easier.

Let $X_1,Y_1$ be the votes received by the contestants in after 80% of polling respectively and $X_2,Y_2$ be the votes respectively in the remaining 20% polling. For the sake of simplicity, let the total number of votes be $20n$ for a large $n$.

We know that, if $X\sim \text{Bin}(m,p)$, then for large $m$, $X$ is approximately distributed as $\mathcal{N}(mp,mpq)$. Therefore, for $p=q=1/2$,

$\displaystyle X_1,Y_1\sim \mathcal{N}\left(\frac{16n}{2}, \frac{16n}{4}\right)$ and $\displaystyle X_2,Y_2 \sim \mathcal{N}\left(\frac{4n}{2}, \frac{4n}{4}\right)$

Let $E$ be the event denoting the player trailing after 80% polling eventually wins the election. Then,

$\displaystyle \mathbb{P}(E)=2\cdot \frac{1}{2}\cdot \mathbb{P}(X_1+X_2 \leq Y_1+Y_2 \text{ and }X_1 \geq Y_1)$

We can rewrite the same to get

$\mathbb{P}(E)=\mathbb{P}(X_1-Y_1 \leq Y_2 - X_2 \text{ and }X_1-Y_1 \geq 0)$

We also know that if $U\sim \mathcal{N}(\mu_1,\sigma_1^2)$ and $V\sim \mathcal{N}(\mu_2,\sigma_2^2)$, then $aU+bV\sim \mathcal{N}(a\mu_1+b\mu_2,a^2\sigma_1^2+b^2\sigma_2^2)$

Therefore, $X_1-Y_1 \sim \mathcal{N}\left(0,8n\right)=\sqrt{8n}Z_1$ and $X_2-Y_2\sim \mathcal{N}\left(0,2n\right)=\sqrt{2n}Z_2$

where $Z_1$ and $Z_2$ are standard Normal variables.

Therefore,

$\begin{align}\displaystyle \mathbb{P}(E)&=\mathbb{P}(2\sqrt{2n}Z_1\leq \sqrt{2n}Z_2 \text{ and } 2\sqrt{2n}Z_1 \geq 0)\\ &=\mathbb{P}(2Z_1\leq Z_2 \text{ and } Z_1 \geq 0) \\ &= \mathbb{P}(Z_1 \leq Z_2/2 \text{ and }Z_1 \geq 0)\\ &=\mathbb{P}(0 \leq Z_1 \leq Z_2/2) \\ &= \mathbb{P}\left(0 \leq \frac{Z_1}{Z_2} \leq \frac{1}{2}\right)\\ &= \mathbb{P}\left(0 \leq W \leq \frac{1}{2}\right)\\ \end{align}$

where $W$ is the ratio of two standard Normal distributions and hence a Cauchy random variable. As the CDF of a Cauchy variable has a closed form in terms of arctangent function, we finally have

$\displaystyle \mathbb{P}(E)=\frac{1}{\pi}\tan^{-1}\left(\frac{1}{2}\right)\approx 0.1475$

If $E$ denotes the event that the contestant trailing when the fraction $a$ of votes remains to counted eventually wins the election. then

$\displaystyle \mathbb{P}(E)=\frac{1}{\pi}\tan^{-1}\left(\sqrt{\frac{a}{1-a}}\right)=\frac{1}{\pi}\sin^{-1}\sqrt{a}$

Hope you enjoyed the discussion. See ya in the next post.

Until Then

Yours Aye

Saturday, April 10, 2021

Expected Value in terms of CDF

It is well known that the expectation of a non-negative random variable $X$ can be written as

$\displaystyle \mathbb{E}[X] \overset{\underset{\mathrm{d}}{}}{=} \sum_{k=0}^\infty \mathbb{P}(X>k) \overset{\underset{\mathrm{c}}{}}{=} \int\limits_0^\infty \mathbb{P}(X>x)\,dx$

for the discrete and continuous cases.

It's quite easy to prove this, at least in the discrete case. It is interesting that, in the same vein, this can be extended to arbitrary functions. That is,

$\begin{align} \displaystyle \mathbb{E}[g(X)]&=\sum_{k=0}^\infty g(k)\mathbb{P}(X=k)\\ &=\sum_{k=0}^\infty \Delta g(k)\mathbb{P}(X>k)\\ \end{align}$

where $\Delta g(k)=g(k+1)-g(k)$ is the forward difference operator. WLOG, We also have made an assumptions that $g(0)=0$.

Comparing this with continuous case, we can make an 'educated guess' that

$\displaystyle \mathbb{E}[g(X)]=\int\limits_0^\infty \mathbb{P}(X>x)\,dg(x)$

We made a post about estimating a sum with probability where we showed the expected error in the approximation is given by

$\displaystyle \mathbb{E}(\delta)=\sum_{k=1}^\infty f(k)\frac{\binom{n-k}{m}}{\binom{n}{m}}$

Note that the term involving the ratio of binomial coefficients can be interpreted as the probability of the minimum-of-the-$n$-tuple being greater than $k$. Therefore,

$\displaystyle \mathbb{E}(\delta)=\sum_{k=1}^\infty f(k)\mathbb{P}(Y>k)$

where $Y$ denotes the smallest order statistic.

Comparing this with our expression for expectation, we see that the expected value of the (probabilistic) Right Riemann sum is

$\displaystyle \mathbb{E}[\text{Right Riemann sum}] \overset{\underset{\mathrm{d}}{}}{=} \mathbb{E}\left[\sum_{j=Y}^n f(j)\right] \overset{\underset{\mathrm{c}}{}}{=} \mathbb{E}\left[ \int\limits_Y^1 f(x)\,dx \right]$

Without going into further calculations, I'm guessing that

(i) $\displaystyle \mathbb{E}[\text{Left Riemann sum}] \overset{\underset{\mathrm{d}}{}}{=} \mathbb{E}\left[\sum_{j=0}^Z f(j)\right] \overset{\underset{\mathrm{c}}{}}{=} \mathbb{E}\left[ \int\limits_0^Z f(x)\,dx \right]$

(ii) $\displaystyle \mathbb{E}[\text{error in Trapezoidal sum}] \overset{\underset{\mathrm{d}}{}}{=} \frac{1}{2}\mathbb{E}\left[\sum_{j=Y}^Z f(j)\right] \overset{\underset{\mathrm{c}}{}}{=} \frac{1}{2}\mathbb{E}\left[ \int\limits_Y^Z f(x)\,dx \right]$

where $Z$ denotes the largest order statistic.

Hope you enjoyed the discussion. See ya in the next post.

Until then

Yours Aye

Monday, January 18, 2021

Probability on a standard deck of cards

I'm a great fan of Probability and Combinatorics. The question that we are going to solve in this post remains as one of my most favorite question in this area. I've been meaning to solve this one for so long and finally I'm glad I did in recently.

Consider a standard deck of 52 cards that is randomly shuffled. What is the probability that we do not have any King adjacent to any Queen after the shuffle?

Like I said, there were multiple instances where I used to think about this problem for long and then get distracted elsewhere. This time Possibly Wrong brought it back again to my attention. There is already a closed form solution there but I couldn't understand it. So, I decided to solve it Generating functions.

Stripping the problem down of all labels, the equivalent problem is to consider the number of words that we can form from the alphabet $\mathcal{A}=\{a,b,c\}$ with no occurrence of the sub-word $ab$ or $ba$. The $a$'s correspond to the Kings in the deck, $b$'s to the Queens and $c$'s to the rest of the cards.

Two important points here: First, We have to specifically find the number of words with 4 $a$'s, 4 $b$'s and 44 $c$'s. Second, convince yourself that both the problems are equivalent.

Pattern avoidance problems are easily tacked with the idea presented in Page 60 of 'amazing' Analytic Combinatorics.

Following the same, let $\mathcal{S}$ be the language words with no occurrence of $ab$ or $ba$, $\mathcal{T}_{ab}$ be those words that end with $ab$ but have no other occurrence of $ab$ and likewise for $\mathcal{T}_{ba}$ be those words that end with $ba$ but have no other occurrence of $ba$.

Appending a letter from $\mathcal{A}$ to $\mathcal{S}$, we find a non-empty word either in $\mathcal{S}$, $\mathcal{T}_{ab}$ or $\mathcal{T}_{ba}$. Therefore,

$\mathcal{S}+\mathcal{T}_{ab}+\mathcal{T}_{ba}=\{\epsilon\}+\mathcal{S}\times\mathcal{A}$

Appending $ab$ to $\mathcal{S}$, we either get a word from $\mathcal{T}_{ab}$ or a word from $\mathcal{T}_{ba}$ appending with $b$. Therefore,

$\mathcal{S}\times ab=\mathcal{T}_{ab}+\mathcal{T}_{ba}b$

Similarly, we have, $\mathcal{S}\times ba=\mathcal{T}_{ba}+\mathcal{T}_{ab}a$

The three equations in terms of OGFs become,

$S+T_{ab}+T_{ba}=1+S\cdot(a+b+c)$

$S\cdot ab=T_{ab}+T_{ba}\cdot b$

$S\cdot ba=T_{ba}+T_{ab}\cdot a$

We have three equations in three unknowns. Solving for $S$, we get,

$\displaystyle S=\frac{1-ab}{(1-a)(1-b)-(1-ab)c}$

which should be regarded as the generating function in terms of variables $a$, $b$ and $c$. So the coefficient of $a^ib^jc^k$ gives the number of words that avoid both $ab$ and $ba$ using $i$ $a$'s, $j$ $b$'s and $k$ $c$'s.

The coefficient of $c^k$ in this generating function is elementary. We get,

$\displaystyle [c^k]S=\frac{(1-ab)^{k+1}}{(1-a)^{k+1}(1-b)^{k+1}}=\left(\frac{1-ab}{(1-a)(1-b)}\right)^{k+1}$

Something interesting happens here. We note that

$\displaystyle \frac{1-ab}{(1-a)(1-b)}=1+a+b+a^2+b^2+a^3+b^3+a^4+b^4+\cdots$

Therefore the coefficient of $a^ib^j$ in the expansion $[c^k]S$ above has the following interpretation: It is the number of ways that a bipartite number $(i,j)$ can be written as sum of $k+1$ bipartite numbers of the form $(u,0)$ and $(0,v)$ with $u,v\geq0$.

This can be achieved with simple combinatorics. Of the $k+1$ numbers, choose $m$ numbers for $i$. Distribute $i$ in those $m$ numbers such that each numbers gets at least $1$. Distribute $j$ in the remaining $k+1-m$ numbers.

Therefore,

$\displaystyle [a^ib^jc^k]S=\sum_{m=1}^i \binom{k+1}{m}\binom{i-1}{m-1}\binom{j+k-m}{j}$

There we have it!

But, wait a minute.. The final answer gives us a very simple combinatorial way of arriving at the answer. Just lay down all the $k$ $c$-cards. Now there will be a total of $k+1$ gaps between those cards. Choose $m$ among them and distribute the $i$ $a$-cards in those $m$ spaces such that each space gets at least one card. Now distribute the $j$ $b$-cards in the remaining $k+1-m$ spaces.

Knowing this, I feel stupid to have gone through the generating function route to get the solution. Grrr...

Anyway, to get the desired probability we use $i=j=4$ and $k=44$,

$\displaystyle \mathbb{P}(\text{No King and Queen adjacent})=\frac{4!4!44!\sum_{m=1}^4 \binom{44+1}{m}\binom{4-1}{m-1}\binom{4+44-m}{4}}{52!}$

which matches with the answer given in the quoted page.

One more small note before we conclude. Take any two Aces out of the deck. Now the probability of no King adjacent to any Queen in this pack of 50 cards is given by using $i=j=4$ and $k=42$ which is $\approx 0.499087$, surprisingly close to $1/2$. Very nearly a fair bet!

I'm convinced that the author of the page left the answer in a specific form as hint for someone attempting to solve the problem but it didn't help me. But I'm glad that I was able to derive a simpler form of the solution with some intuition on why the solution looks the way it does. Hope you enjoyed this discussion.

Clear["Global`*"];

SeriesCoefficient[1/(

1 - a - b - c + (a b (a - 1 + b - 1))/(a b - 1)), {a, 0, 4}, {b, 0,

4}, {c, 0, 44}]/Multinomial[4, 4, 44]

1 - N[%]

1 - NSum[Binomial[44 + 1, m] Binomial[4 - 1, m - 1] Binomial[

4 + 44 - m, 4], {m, 4}]/Multinomial[4, 4, 44]

SeriesCoefficient[(

1 - a b)/((1 - a) (1 - b) - (1 - a b) c), {a, 0, 4}, {b, 0, 12}, {c,

0, 36}]/Multinomial[4, 12, 36]

1 - N[%]

1 - NSum[Binomial[36 + 1, m] Binomial[4 - 1, m - 1] Binomial[

12 + 36 - m, 12], {m, 4}]/Multinomial[4, 12, 36]

SeriesCoefficient[(

1 - a b)/((1 - a) (1 - b) - (1 - a b) c), {a, 0, 4}, {b, 0, 16}, {c,

0, 32}]/Multinomial[4, 16, 32]

1 - N[%]

1 - NSum[Binomial[32 + 1, m] Binomial[4 - 1, m - 1] Binomial[

16 + 32 - m, 16], {m, 4}]/Multinomial[4, 16, 32]

Until then

Yours Aye

Saturday, January 16, 2021

Probability on a Contingency Table

Contingency tables are quite common in understanding a classification problems like that of a ML model or new drug tested against a disease. Given that we are just recovering from a pandemic, let's stick to the case of a Machine Learning model. In the context of ML models, it is called the Confusion matrix and we'll use both the terms interchangeably in this post.

A 2x2 contingency table usually has two columns for the binary classification (Win vs. Lose, Apple vs. Orange, White vs. Black etc.) and two rows for whether the prediction was right or wrong. Let's consider the classification as a 'Hypothesis' and the model's prediction as an 'Evidence' supporting it.

Here is how our contingency table would look like.

Table 1	H	⌐H
E	n(H∩E)	n(⌐H∩E)
⌐E	n(H∩⌐E)	n(⌐H∩⌐E)

where $n(A)$ denotes the number of elements in set $A$.

We can normalize this table by dividing each of the four entries by the total thereby creating a new table.

Table 2	H	⌐H
E	$\mathbb{P}(H\cap E)$	$\mathbb{P}(\neg H\cap E)$
⌐E	$\mathbb{P}(H \cap \neg E)$	$\mathbb{P}(\neg H \cap \neg E)$

where we can view each entry as the probability of a classification falling in that bracket and $\mathbb{P}(A)$ denotes the probability of event $A$. Note that the sum of all the entries of Table 2 is 1.

The Wiki page on Confusion matrix gives a huge list of metrics that can be derived out of this table. In this post, we visit a couple of probability problems created from them.

Three of the important metrics that I learned in the context of ML from these matrices are

Precision, $\displaystyle p=\frac{\mathbb{P}(H\cap E)}{\mathbb{P}(H\cap E)+\mathbb{P}(\neg H\cap E)}=\frac{\mathbb{P}(H\cap E)}{\mathbb{P}(E)}=\mathbb{P}(H|E)$

Recall, $\displaystyle r=\frac{\mathbb{P}(H\cap E)}{\mathbb{P}(H\cap E)+\mathbb{P}(H\cap \neg E)}=\frac{\mathbb{P}(H\cap E)}{\mathbb{P}(H)}=\mathbb{P}(E|H)$

and Accuracy, $a=\mathbb{P}(H \cap E)+\mathbb{P}(\neg H \cap \neg E)$

Suppose, we want to create a random (normalized) confusion matrix. One way to do this would be create random variables all between 0 and 1, that also sum to 1. We can use Dirichlet distribution with four parameters to achieve this.

But there may be instances where we want to create a confusion matrix with a given precision, recall and accuracy. There are four entries in the table. Given three metrics and the fact that the entries should add upto 1 would seem to suggest, that these completely define the table.

But not all such values can create a valid table. For example, its impossible to create a valid table with a precision of 66.7%, recall 90.0% and accuracy 60%. So our first question is, given that precision, recall and accuracy are all uniformly distributed random variables, what is the probability that we will end up with a valid table.

To produce a valid table, the three variables need to satisfy the condition

$\displaystyle a \geq \frac{pr}{1-(1-p)(1-r)}$

Let $T$ be event of getting a valid table. Using the above we have,

$\displaystyle \mathbb{P}(T)=\int\limits_0^1\int\limits_0^1\int\limits_0^1\left[ a \geq \frac{pr}{1-(1-p)(1-r)} \right]\,dp\,dr\,da$

For a moment, let's assume we assume that $a$ is given. Then we first solve

$\displaystyle F(a)=\int\limits_0^1\int\limits_0^1\left[a \geq \frac{pr}{1-(1-p)(1-r)}\right]\,dp\,dr$

We can simplify the expression inside the Iverson bracket as a curve in the $r-p$ plane. The equation of the curve is given by

$\displaystyle r=\frac{ap}{(1+a)p-a}$

Plotting the region for $a=1/4$, we get the following graph.

The region to be integrated lies between the curve and the two axes. We can divide this region along the $p=r$ line. This line intersects the graph at $\left(\frac{2a}{1+a},\frac{2a}{1+a}\right)$. Therefore,

$\displaystyle F(a)=\frac{4a^2}{(1+a)^2}+2\int\limits_{\frac{2a}{1+a}}^1\frac{a p}{(1+a)p-a}\,dp$

Solving the second integral with Wolfram Alpha, we get,

$\displaystyle F(a)=\frac{4a^2}{(1+a)^2}+2\frac{a(1-a)}{(1+a)^2}-2\frac{a^2\log{a}}{(1+a)^2}=\frac{2a}{1+a}-\frac{2a^2\log{a}}{(1+a)^2}$

Plugging this back in our original equation and integrating, we see that,

$\displaystyle \mathbb{P}(T)=\int\limits_0^1\left(\frac{2a}{1+a}-\frac{2a^2\log{a}}{(1+a)^2}\right)\,da=4-\frac{\pi^2}{3}\approx 0.710132$

Thus we see that the just about 29% of the tables will not be valid. Something that truly surprised me here is the fact that $\pi$ makes an appearance here. There are no circles (not even something close to that) in this problem!!

The second problem is we see is also quite similar. Now, assume that we create tables (like that of Table 2) such that the values are uniformly distributed and sum to 1. If we want the precision and recall of our random tables to be greater than some threshold, what would be expected accuracy of the table?

For clarity, let $\mathbb{P}(H \cap E)=X$, $\mathbb{P}(\neg H \cap E)=Y$, $\mathbb{P}(H \cap \neg E)=Z$ and $\mathbb{P}(\neg H \cap \neg E)=W$, then $(X,Y,Z,W)\sim \text{Dir}(1,1,1,1)$

$\displaystyle \mathbb{E}(1-Y-Z|\mathbb{P}(E|H)\geq r,\mathbb{P}(H|E)\geq p)=\frac{1}{V}\int\limits_Q 1-y-z \,dx\,dy\,dz$

where $Q$ is the region such that

$Q=\{(x,y,z):(1-p)x\geq py \land (1-r)x\geq rz \land x+y+z \leq 1 \land x,y,z\geq 0\}$

and $V$ is the volume enclosed by $Q$.

Evaluating this integral is not so easy. The region integration depends on the value of $p$ and $r$ and it kind of ends in a mess of equations. But with some luck and a lot of Mathematica, we can see

$\displaystyle \mathbb{E}(\mathbb{P}(H \cap E)+\mathbb{P}(\neg H \cap \neg E)\text{ }|\text{ }\mathbb{P}(E|H)\geq r,\mathbb{P}(H|E)\geq p)=\frac{2(p+r)^2+p^3+r^3-pr(p^2+r^2)}{4(p+r)(1-(1-p)(1-r))}$

I have no way of making sense of that expression but, hey, we have an expectation on probabilities!

Hope you enjoyed this discussion.

Clear["Global`*"];

p = 200/1000; r = 250/1000;

ImpReg[a_, p_, r_] :=

ImplicitRegion[

y + z <= 1 - a && (1 - p) x >= p y && (1 - r) x >= r z && 0 <= x &&

0 <= y && 0 <= z && x + y + z <= 1, {x, y, z}];

ImpRegap[a_, p_] :=

ImplicitRegion[

y + z <= 1 - a && (1 - p) x >= p y && 0 <= x && 0 <= y && 0 <= z &&

x + y + z <= 1, {x, y, z}];

ImpRegpr[p_, r_] :=

ImplicitRegion[(1 - p) x >= p y && (1 - r) x >= r z && 0 <= x &&

0 <= y && 0 <= z && x + y + z <= 1, {x, y, z}];

ImpRegra[r_, a_] :=

ImplicitRegion[

y + z <= 1 - a && (1 - r) x >= r z && 0 <= x && 0 <= y && 0 <= z &&

x + y + z <= 1, {x, y, z}];

ExpecAcc[p_, r_] := (-2 p^2 - p^3 - 4 p r + p^3 r - 2 r^2 - r^3 +

p r^3)/(4 (p + r) (-p - r + p r));

ExpecAcc[p, r]

N[%]

lim = 1000000;

cnt = 0;

val = 0;

Do[

x = -Log[RandomReal[]]; y = -Log[RandomReal[]];

z = -Log[RandomReal[]]; w = -Log[RandomReal[]];

t = w + x + y + z;

x /= t; w /= t; y /= t; z /= t;

If[And[x/(x + y) >= p, x/(x + z) >= r],

cnt += 1; val += 1 - y - z;

];

, {i, lim}];

N[val/cnt]

Until then

Yours Aye

Thursday, January 7, 2021

Isochrone on a Spherical surface

The Cycloid in all its grandeur scooped all the glory for itself by way of being both the brachistochrone curve and the tautochrone curve.

But, seemingly out of some dark magic, the semi-cubical parabola makes its way as the (vertical) isochrone curve, a curve on which a bead sliding without friction covers equal vertical distances in equal intervals of time.

In my last post, we found the differential equation of a tautochrone curve on a spherical surface and used it to find the curve. In this post, we kind of continue the same discussion.

Our aim in this post is to find a curve on the surface of a sphere such that a bead sliding (without friction) under the influence of gravity covers equal polar angles at equal intervals of time. In other words, we are trying to find the (polar) Isochrone curve on the spherical surface.

For reasons that'll be apparent later, we modify our problem construct a little. Let's assume that the bead starts at the north pole and slides (without friction) slowly along the $\phi=0$ plane from $\theta=0$ to $\theta=\theta_0$ where the bead seamlessly enters the curve.

Let $\omega=d\theta/dt$ be the constant polar velocity as the bead enters the isochrone curve. Also, note that, because of the way the bead travels before entering the curve, the azimuthal velocity will be zero at the entry point.

Using the differential arclength of a curve on a (unit) spherical surface, we have,

$\displaystyle \left(\frac{ds}{dt}\right)^2 =\left(\frac{d\theta}{dt}\right)^2+\sin^2\theta\left(\frac{d\phi}{dt}\right)^2=\left(\frac{d\theta}{dt}\right)^2\left(1+\sin^2\theta\left(\frac{d\phi}{d\theta}\right)^2\right)$

Let the plane tangent to the sphere at the south pole be the 'base' from which we measure the gravitational potential energy. Then, using the law of conservation of energy at $\theta=\theta_0$ (just after the bead entered the isochrone) and any point beyond,

$\displaystyle \omega^2\left(1+\sin^2\theta\left(\frac{d\phi}{d\theta}\right)^2\right)+2g(1+\cos\theta)=\omega^2+2g(1+\cos\theta_0)$

Simplifying this,

$\displaystyle \frac{d\phi}{d\theta}=\frac{\sqrt{2g}}{\omega}\frac{\sqrt{\cos\theta_0-\cos\theta}}{\sin\theta}$

It is easy to express $\omega$ in terms of $\theta_0$. Using the law of conservation of energy at $\theta=0$ and $\theta=\theta_0$ (just before the bead enters the isochrone), it is easy to see,

$\omega=\sqrt{2g(1-\cos\theta_0)}$

Using this in the expression above, we finally have,

$\displaystyle \frac{d\phi}{d\theta}=\frac{1}{\sin\theta}\sqrt{\frac{\cos\theta_0-\cos\theta}{1-\cos\theta_0}}$

(WARNING: Now is the time where I give you a fair warning to keep your mind in a secured place because it is about to be blown.)

Looking at the above differential equation, it is clear that it is the same equation we found in our previous post for the tautochrone problem. The same curve solves both the problems just like the cycloid does in plane geometry!!!

This says, if you place the bead at any point on our curve, the time it takes to reach the south pole is the same and hence becomes tautochrone. But if you place the bead at the north pole and let it slide itself into the curve, then it covers equal polar angles at equal intervals of times and hence becomes (polar) isochrone.

Truly, some dark magic stuff going on. I would have never expected in any way that such a weird coincidence would happen in the spherical case. I truly enjoyed this. Hope you did too. See you in the next post.

Until Then

Yours Aye