I write, therefore I am

Saturday, April 10, 2021

Expected Value in terms of CDF

It is well known that the expectation of a non-negative random variable $X$ can be written as

$\displaystyle \mathbb{E}[X] \overset{\underset{\mathrm{d}}{}}{=} \sum_{k=0}^\infty \mathbb{P}(X>k) \overset{\underset{\mathrm{c}}{}}{=} \int\limits_0^\infty \mathbb{P}(X>x)\,dx$

for the discrete and continuous cases.

It's quite easy to prove this, at least in the discrete case. It is interesting that, in the same vein, this can be extended to arbitrary functions. That is,

$\begin{align} \displaystyle \mathbb{E}[g(X)]&=\sum_{k=0}^\infty g(k)\mathbb{P}(X=k)\\ &=\sum_{k=0}^\infty \Delta g(k)\mathbb{P}(X>k)\\ \end{align}$

where $\Delta g(k)=g(k+1)-g(k)$ is the forward difference operator. WLOG, We also have made an assumptions that $g(0)=0$.

Comparing this with continuous case, we can make an 'educated guess' that

$\displaystyle \mathbb{E}[g(X)]=\int\limits_0^\infty \mathbb{P}(X>x)\,dg(x)$

We made a post about estimating a sum with probability where we showed the expected error in the approximation is given by

$\displaystyle \mathbb{E}(\delta)=\sum_{k=1}^\infty f(k)\frac{\binom{n-k}{m}}{\binom{n}{m}}$

Note that the term involving the ratio of binomial coefficients can be interpreted as the probability of the minimum-of-the-$n$-tuple being greater than $k$. Therefore,

$\displaystyle \mathbb{E}(\delta)=\sum_{k=1}^\infty f(k)\mathbb{P}(Y>k)$

where $Y$ denotes the smallest order statistic.

Comparing this with our expression for expectation, we see that the expected value of the (probabilistic) Right Riemann sum is

$\displaystyle \mathbb{E}[\text{Right Riemann sum}] \overset{\underset{\mathrm{d}}{}}{=} \mathbb{E}\left[\sum_{j=Y}^n f(j)\right] \overset{\underset{\mathrm{c}}{}}{=} \mathbb{E}\left[ \int\limits_Y^1 f(x)\,dx \right]$

Without going into further calculations, I'm guessing that

(i) $\displaystyle \mathbb{E}[\text{Left Riemann sum}] \overset{\underset{\mathrm{d}}{}}{=} \mathbb{E}\left[\sum_{j=0}^Z f(j)\right] \overset{\underset{\mathrm{c}}{}}{=} \mathbb{E}\left[ \int\limits_0^Z f(x)\,dx \right]$

(ii) $\displaystyle \mathbb{E}[\text{error in Trapezoidal sum}] \overset{\underset{\mathrm{d}}{}}{=} \frac{1}{2}\mathbb{E}\left[\sum_{j=Y}^Z f(j)\right] \overset{\underset{\mathrm{c}}{}}{=} \frac{1}{2}\mathbb{E}\left[ \int\limits_Y^Z f(x)\,dx \right]$

where $Z$ denotes the largest order statistic.

Hope you enjoyed the discussion. See ya in the next post.

Until then

Yours Aye

Monday, January 18, 2021

Probability on a standard deck of cards

I'm a great fan of Probability and Combinatorics. The question that we are going to solve in this post remains as one of my most favorite question in this area. I've been meaning to solve this one for so long and finally I'm glad I did in recently.

Consider a standard deck of 52 cards that is randomly shuffled. What is the probability that we do not have any King adjacent to any Queen after the shuffle?

Like I said, there were multiple instances where I used to think about this problem for long and then get distracted elsewhere. This time Possibly Wrong brought it back again to my attention. There is already a closed form solution there but I couldn't understand it. So, I decided to solve it Generating functions.

Stripping the problem down of all labels, the equivalent problem is to consider the number of words that we can form from the alphabet $\mathcal{A}=\{a,b,c\}$ with no occurrence of the sub-word $ab$ or $ba$. The $a$'s correspond to the Kings in the deck, $b$'s to the Queens and $c$'s to the rest of the cards.

Two important points here: First, We have to specifically find the number of words with 4 $a$'s, 4 $b$'s and 44 $c$'s. Second, convince yourself that both the problems are equivalent.

Pattern avoidance problems are easily tacked with the idea presented in Page 60 of 'amazing' Analytic Combinatorics.

Following the same, let $\mathcal{S}$ be the language words with no occurrence of $ab$ or $ba$, $\mathcal{T}_{ab}$ be those words that end with $ab$ but have no other occurrence of $ab$ and likewise for $\mathcal{T}_{ba}$ be those words that end with $ba$ but have no other occurrence of $ba$.

Appending a letter from $\mathcal{A}$ to $\mathcal{S}$, we find a non-empty word either in $\mathcal{S}$, $\mathcal{T}_{ab}$ or $\mathcal{T}_{ba}$. Therefore,

$\mathcal{S}+\mathcal{T}_{ab}+\mathcal{T}_{ba}=\{\epsilon\}+\mathcal{S}\times\mathcal{A}$

Appending $ab$ to $\mathcal{S}$, we either get a word from $\mathcal{T}_{ab}$ or a word from $\mathcal{T}_{ba}$ appending with $b$. Therefore,

$\mathcal{S}\times ab=\mathcal{T}_{ab}+\mathcal{T}_{ba}b$

Similarly, we have, $\mathcal{S}\times ba=\mathcal{T}_{ba}+\mathcal{T}_{ab}a$

The three equations in terms of OGFs become,

$S+T_{ab}+T_{ba}=1+S\cdot(a+b+c)$

$S\cdot ab=T_{ab}+T_{ba}\cdot b$

$S\cdot ba=T_{ba}+T_{ab}\cdot a$

We have three equations in three unknowns. Solving for $S$, we get,

$\displaystyle S=\frac{1-ab}{(1-a)(1-b)-(1-ab)c}$

which should be regarded as the generating function in terms of variables $a$, $b$ and $c$. So the coefficient of $a^ib^jc^k$ gives the number of words that avoid both $ab$ and $ba$ using $i$ $a$'s, $j$ $b$'s and $k$ $c$'s.

The coefficient of $c^k$ in this generating function is elementary. We get,

$\displaystyle [c^k]S=\frac{(1-ab)^{k+1}}{(1-a)^{k+1}(1-b)^{k+1}}=\left(\frac{1-ab}{(1-a)(1-b)}\right)^{k+1}$

Something interesting happens here. We note that

$\displaystyle \frac{1-ab}{(1-a)(1-b)}=1+a+b+a^2+b^2+a^3+b^3+a^4+b^4+\cdots$

Therefore the coefficient of $a^ib^j$ in the expansion $[c^k]S$ above has the following interpretation: It is the number of ways that a bipartite number $(i,j)$ can be written as sum of $k+1$ bipartite numbers of the form $(u,0)$ and $(0,v)$ with $u,v\geq0$.

This can be achieved with simple combinatorics. Of the $k+1$ numbers, choose $m$ numbers for $i$. Distribute $i$ in those $m$ numbers such that each numbers gets at least $1$. Distribute $j$ in the remaining $k+1-m$ numbers.

Therefore,

$\displaystyle [a^ib^jc^k]S=\sum_{m=1}^i \binom{k+1}{m}\binom{i-1}{m-1}\binom{j+k-m}{j}$

There we have it!

But, wait a minute.. The final answer gives us a very simple combinatorial way of arriving at the answer. Just lay down all the $k$ $c$-cards. Now there will be a total of $k+1$ gaps between those cards. Choose $m$ among them and distribute the $i$ $a$-cards in those $m$ spaces such that each space gets at least one card. Now distribute the $j$ $b$-cards in the remaining $k+1-m$ spaces.

Knowing this, I feel stupid to have gone through the generating function route to get the solution. Grrr...

Anyway, to get the desired probability we use $i=j=4$ and $k=44$,

$\displaystyle \mathbb{P}(\text{No King and Queen adjacent})=\frac{4!4!44!\sum_{m=1}^4 \binom{44+1}{m}\binom{4-1}{m-1}\binom{4+44-m}{4}}{52!}$

which matches with the answer given in the quoted page.

One more small note before we conclude. Take any two Aces out of the deck. Now the probability of no King adjacent to any Queen in this pack of 50 cards is given by using $i=j=4$ and $k=42$ which is $\approx 0.499087$, surprisingly close to $1/2$. Very nearly a fair bet!

I'm convinced that the author of the page left the answer in a specific form as hint for someone attempting to solve the problem but it didn't help me. But I'm glad that I was able to derive a simpler form of the solution with some intuition on why the solution looks the way it does. Hope you enjoyed this discussion.

Clear["Global`*"];

SeriesCoefficient[1/(

1 - a - b - c + (a b (a - 1 + b - 1))/(a b - 1)), {a, 0, 4}, {b, 0,

4}, {c, 0, 44}]/Multinomial[4, 4, 44]

1 - N[%]

1 - NSum[Binomial[44 + 1, m] Binomial[4 - 1, m - 1] Binomial[

4 + 44 - m, 4], {m, 4}]/Multinomial[4, 4, 44]

SeriesCoefficient[(

1 - a b)/((1 - a) (1 - b) - (1 - a b) c), {a, 0, 4}, {b, 0, 12}, {c,

0, 36}]/Multinomial[4, 12, 36]

1 - N[%]

1 - NSum[Binomial[36 + 1, m] Binomial[4 - 1, m - 1] Binomial[

12 + 36 - m, 12], {m, 4}]/Multinomial[4, 12, 36]

SeriesCoefficient[(

1 - a b)/((1 - a) (1 - b) - (1 - a b) c), {a, 0, 4}, {b, 0, 16}, {c,

0, 32}]/Multinomial[4, 16, 32]

1 - N[%]

1 - NSum[Binomial[32 + 1, m] Binomial[4 - 1, m - 1] Binomial[

16 + 32 - m, 16], {m, 4}]/Multinomial[4, 16, 32]

Until then

Yours Aye

Saturday, January 16, 2021

Probability on a Contingency Table

Contingency tables are quite common in understanding a classification problems like that of a ML model or new drug tested against a disease. Given that we are just recovering from a pandemic, let's stick to the case of a Machine Learning model. In the context of ML models, it is called the Confusion matrix and we'll use both the terms interchangeably in this post.

A 2x2 contingency table usually has two columns for the binary classification (Win vs. Lose, Apple vs. Orange, White vs. Black etc.) and two rows for whether the prediction was right or wrong. Let's consider the classification as a 'Hypothesis' and the model's prediction as an 'Evidence' supporting it.

Here is how our contingency table would look like.

Table 1	H	⌐H
E	n(H∩E)	n(⌐H∩E)
⌐E	n(H∩⌐E)	n(⌐H∩⌐E)

where $n(A)$ denotes the number of elements in set $A$.

We can normalize this table by dividing each of the four entries by the total thereby creating a new table.

Table 2	H	⌐H
E	$\mathbb{P}(H\cap E)$	$\mathbb{P}(\neg H\cap E)$
⌐E	$\mathbb{P}(H \cap \neg E)$	$\mathbb{P}(\neg H \cap \neg E)$

where we can view each entry as the probability of a classification falling in that bracket and $\mathbb{P}(A)$ denotes the probability of event $A$. Note that the sum of all the entries of Table 2 is 1.

The Wiki page on Confusion matrix gives a huge list of metrics that can be derived out of this table. In this post, we visit a couple of probability problems created from them.

Three of the important metrics that I learned in the context of ML from these matrices are

Precision, $\displaystyle p=\frac{\mathbb{P}(H\cap E)}{\mathbb{P}(H\cap E)+\mathbb{P}(\neg H\cap E)}=\frac{\mathbb{P}(H\cap E)}{\mathbb{P}(E)}=\mathbb{P}(H|E)$

Recall, $\displaystyle r=\frac{\mathbb{P}(H\cap E)}{\mathbb{P}(H\cap E)+\mathbb{P}(H\cap \neg E)}=\frac{\mathbb{P}(H\cap E)}{\mathbb{P}(H)}=\mathbb{P}(E|H)$

and Accuracy, $a=\mathbb{P}(H \cap E)+\mathbb{P}(\neg H \cap \neg E)$

Suppose, we want to create a random (normalized) confusion matrix. One way to do this would be create random variables all between 0 and 1, that also sum to 1. We can use Dirichlet distribution with four parameters to achieve this.

But there may be instances where we want to create a confusion matrix with a given precision, recall and accuracy. There are four entries in the table. Given three metrics and the fact that the entries should add upto 1 would seem to suggest, that these completely define the table.

But not all such values can create a valid table. For example, its impossible to create a valid table with a precision of 66.7%, recall 90.0% and accuracy 60%. So our first question is, given that precision, recall and accuracy are all uniformly distributed random variables, what is the probability that we will end up with a valid table.

To produce a valid table, the three variables need to satisfy the condition

$\displaystyle a \geq \frac{pr}{1-(1-p)(1-r)}$

Let $T$ be event of getting a valid table. Using the above we have,

$\displaystyle \mathbb{P}(T)=\int\limits_0^1\int\limits_0^1\int\limits_0^1\left[ a \geq \frac{pr}{1-(1-p)(1-r)} \right]\,dp\,dr\,da$

For a moment, let's assume we assume that $a$ is given. Then we first solve

$\displaystyle F(a)=\int\limits_0^1\int\limits_0^1\left[a \geq \frac{pr}{1-(1-p)(1-r)}\right]\,dp\,dr$

We can simplify the expression inside the Iverson bracket as a curve in the $r-p$ plane. The equation of the curve is given by

$\displaystyle r=\frac{ap}{(1+a)p-a}$

Plotting the region for $a=1/4$, we get the following graph.

The region to be integrated lies between the curve and the two axes. We can divide this region along the $p=r$ line. This line intersects the graph at $\left(\frac{2a}{1+a},\frac{2a}{1+a}\right)$. Therefore,

$\displaystyle F(a)=\frac{4a^2}{(1+a)^2}+2\int\limits_{\frac{2a}{1+a}}^1\frac{a p}{(1+a)p-a}\,dp$

Solving the second integral with Wolfram Alpha, we get,

$\displaystyle F(a)=\frac{4a^2}{(1+a)^2}+2\frac{a(1-a)}{(1+a)^2}-2\frac{a^2\log{a}}{(1+a)^2}=\frac{2a}{1+a}-\frac{2a^2\log{a}}{(1+a)^2}$

Plugging this back in our original equation and integrating, we see that,

$\displaystyle \mathbb{P}(T)=\int\limits_0^1\left(\frac{2a}{1+a}-\frac{2a^2\log{a}}{(1+a)^2}\right)\,da=4-\frac{\pi^2}{3}\approx 0.710132$

Thus we see that the just about 29% of the tables will not be valid. Something that truly surprised me here is the fact that $\pi$ makes an appearance here. There are no circles (not even something close to that) in this problem!!

The second problem is we see is also quite similar. Now, assume that we create tables (like that of Table 2) such that the values are uniformly distributed and sum to 1. If we want the precision and recall of our random tables to be greater than some threshold, what would be expected accuracy of the table?

For clarity, let $\mathbb{P}(H \cap E)=X$, $\mathbb{P}(\neg H \cap E)=Y$, $\mathbb{P}(H \cap \neg E)=Z$ and $\mathbb{P}(\neg H \cap \neg E)=W$, then $(X,Y,Z,W)\sim \text{Dir}(1,1,1,1)$

$\displaystyle \mathbb{E}(1-Y-Z|\mathbb{P}(E|H)\geq r,\mathbb{P}(H|E)\geq p)=\frac{1}{V}\int\limits_Q 1-y-z \,dx\,dy\,dz$

where $Q$ is the region such that

$Q=\{(x,y,z):(1-p)x\geq py \land (1-r)x\geq rz \land x+y+z \leq 1 \land x,y,z\geq 0\}$

and $V$ is the volume enclosed by $Q$.

Evaluating this integral is not so easy. The region integration depends on the value of $p$ and $r$ and it kind of ends in a mess of equations. But with some luck and a lot of Mathematica, we can see

$\displaystyle \mathbb{E}(\mathbb{P}(H \cap E)+\mathbb{P}(\neg H \cap \neg E)\text{ }|\text{ }\mathbb{P}(E|H)\geq r,\mathbb{P}(H|E)\geq p)=\frac{2(p+r)^2+p^3+r^3-pr(p^2+r^2)}{4(p+r)(1-(1-p)(1-r))}$

I have no way of making sense of that expression but, hey, we have an expectation on probabilities!

Hope you enjoyed this discussion.

Clear["Global`*"];

p = 200/1000; r = 250/1000;

ImpReg[a_, p_, r_] :=

ImplicitRegion[

y + z <= 1 - a && (1 - p) x >= p y && (1 - r) x >= r z && 0 <= x &&

0 <= y && 0 <= z && x + y + z <= 1, {x, y, z}];

ImpRegap[a_, p_] :=

ImplicitRegion[

y + z <= 1 - a && (1 - p) x >= p y && 0 <= x && 0 <= y && 0 <= z &&

x + y + z <= 1, {x, y, z}];

ImpRegpr[p_, r_] :=

ImplicitRegion[(1 - p) x >= p y && (1 - r) x >= r z && 0 <= x &&

0 <= y && 0 <= z && x + y + z <= 1, {x, y, z}];

ImpRegra[r_, a_] :=

ImplicitRegion[

y + z <= 1 - a && (1 - r) x >= r z && 0 <= x && 0 <= y && 0 <= z &&

x + y + z <= 1, {x, y, z}];

ExpecAcc[p_, r_] := (-2 p^2 - p^3 - 4 p r + p^3 r - 2 r^2 - r^3 +

p r^3)/(4 (p + r) (-p - r + p r));

ExpecAcc[p, r]

N[%]

lim = 1000000;

cnt = 0;

val = 0;

Do[

x = -Log[RandomReal[]]; y = -Log[RandomReal[]];

z = -Log[RandomReal[]]; w = -Log[RandomReal[]];

t = w + x + y + z;

x /= t; w /= t; y /= t; z /= t;

If[And[x/(x + y) >= p, x/(x + z) >= r],

cnt += 1; val += 1 - y - z;

];

, {i, lim}];

N[val/cnt]

Until then

Yours Aye

Thursday, January 7, 2021

Isochrone on a Spherical surface

The Cycloid in all its grandeur scooped all the glory for itself by way of being both the brachistochrone curve and the tautochrone curve.

But, seemingly out of some dark magic, the semi-cubical parabola makes its way as the (vertical) isochrone curve, a curve on which a bead sliding without friction covers equal vertical distances in equal intervals of time.

In my last post, we found the differential equation of a tautochrone curve on a spherical surface and used it to find the curve. In this post, we kind of continue the same discussion.

Our aim in this post is to find a curve on the surface of a sphere such that a bead sliding (without friction) under the influence of gravity covers equal polar angles at equal intervals of time. In other words, we are trying to find the (polar) Isochrone curve on the spherical surface.

For reasons that'll be apparent later, we modify our problem construct a little. Let's assume that the bead starts at the north pole and slides (without friction) slowly along the $\phi=0$ plane from $\theta=0$ to $\theta=\theta_0$ where the bead seamlessly enters the curve.

Let $\omega=d\theta/dt$ be the constant polar velocity as the bead enters the isochrone curve. Also, note that, because of the way the bead travels before entering the curve, the azimuthal velocity will be zero at the entry point.

Using the differential arclength of a curve on a (unit) spherical surface, we have,

$\displaystyle \left(\frac{ds}{dt}\right)^2 =\left(\frac{d\theta}{dt}\right)^2+\sin^2\theta\left(\frac{d\phi}{dt}\right)^2=\left(\frac{d\theta}{dt}\right)^2\left(1+\sin^2\theta\left(\frac{d\phi}{d\theta}\right)^2\right)$

Let the plane tangent to the sphere at the south pole be the 'base' from which we measure the gravitational potential energy. Then, using the law of conservation of energy at $\theta=\theta_0$ (just after the bead entered the isochrone) and any point beyond,

$\displaystyle \omega^2\left(1+\sin^2\theta\left(\frac{d\phi}{d\theta}\right)^2\right)+2g(1+\cos\theta)=\omega^2+2g(1+\cos\theta_0)$

Simplifying this,

$\displaystyle \frac{d\phi}{d\theta}=\frac{\sqrt{2g}}{\omega}\frac{\sqrt{\cos\theta_0-\cos\theta}}{\sin\theta}$

It is easy to express $\omega$ in terms of $\theta_0$. Using the law of conservation of energy at $\theta=0$ and $\theta=\theta_0$ (just before the bead enters the isochrone), it is easy to see,

$\omega=\sqrt{2g(1-\cos\theta_0)}$

Using this in the expression above, we finally have,

$\displaystyle \frac{d\phi}{d\theta}=\frac{1}{\sin\theta}\sqrt{\frac{\cos\theta_0-\cos\theta}{1-\cos\theta_0}}$

(WARNING: Now is the time where I give you a fair warning to keep your mind in a secured place because it is about to be blown.)

Looking at the above differential equation, it is clear that it is the same equation we found in our previous post for the tautochrone problem. The same curve solves both the problems just like the cycloid does in plane geometry!!!

This says, if you place the bead at any point on our curve, the time it takes to reach the south pole is the same and hence becomes tautochrone. But if you place the bead at the north pole and let it slide itself into the curve, then it covers equal polar angles at equal intervals of times and hence becomes (polar) isochrone.

Truly, some dark magic stuff going on. I would have never expected in any way that such a weird coincidence would happen in the spherical case. I truly enjoyed this. Hope you did too. See you in the next post.

Until Then

Yours Aye

Monday, November 30, 2020

Tautochrone curve on a Spherical surface

The Tautochrone problem is one of the classical problems in math. Both the Wikipedia and the Mathworld page present some great math on the problem and makes for an enjoyable read.

I was going through these pages one more time recently and I thought "All of this is fine. But is there a tautochrone curve on a sphere?". Turns out the answer is yes and this post is all about finding such a curve.

The simplest way to attack this problem is to use Able's solution idea. It says,

$\displaystyle \frac{ds}{dy}=\pm\sqrt{2g}\frac{T_0}{\pi}\frac{1}{\sqrt{y}}$

where $T_0$ is the constant time of travel, $g$ is the gravitational constant, $s$ is the arc length of the curve and $y$ is the measure of the gravitational potential energy.

Let's consider a unit sphere centered at the origin. Also, Let the plane tangent to the south pole of the sphere be the 'base' for measuring the gravitational potential.

As we are working on a spherical surface, we'll naturally use spherical geometry for this problem with $\theta$ as the polar angle and $\phi$ as the azimuthal angle. We have,

$y=1+\cos\theta$ which implies $dy=-\sin\theta d\theta$.

Therefore, Abel's equation becomes,

$\displaystyle ds=\sqrt{2g}\frac{T_0}{\pi}\frac{\sin\theta}{\sqrt{1+\cos\theta}}d\theta=\sqrt{2g}\frac{T_0}{\pi}\sqrt{1-\cos\theta}\text{ }d\theta$

The differential arc length in a spherical coordinate system is given by,

$\displaystyle \frac{ds}{d\theta}=\sqrt{1+\sin^2\theta\left(\frac{d\phi}{d\theta}\right)^2}$

Using this and simplifying, we get

$\displaystyle \sin^2\theta\left(\frac{d\phi}{d\theta}\right)^2=2g\left(\frac{T_0}{\pi}\right)^2(1-\cos\theta)-1$

Now, let $2g(T_0/\pi)^2=1/(1-\cos\lambda)$ where $\lambda$ is a constant. Then the above equation with further simplication becomes,

$\displaystyle \frac{d\phi}{d\theta}=\frac{1}{\sin\theta}\sqrt{\frac{\cos\lambda-\cos\theta}{1-\cos\lambda}}$

This is the differential equation that must be satisfied by the tautochrone curve.

We can solve this with the same substitution used in the Mathworld page. We then have,

$\displaystyle \phi=\cot(\lambda/2)\tanh^{-1}\left(\sqrt{\frac{\cos\lambda-\cos\theta}{1+\cos\lambda}}\right)-\tan^{-1}\left(\sqrt{\frac{\cos\lambda-\cos\theta}{1-\cos\lambda}}\right)$

The above expression should also have an arbitrary constant of integration, but let's say we only care about the shape of the curve and hence leave it out of this analysis.

Also, from the expression above, we see that the curve stats at $\theta=\lambda$ and spirals its way down to the south pole. We thus know, once we specify $T_0$, we indirectly specify where the curve should start from and vice versa.

Tautochrone curve with $\lambda=\pi/8$

We know the time of descent of the tautochrone curve is $T_0$. The arc length $L$ of this tautochrone curve is given by,

$\displaystyle L(\theta_0)=\int ds=\int \frac{ds}{d\theta}d\theta=\sqrt{2g}\frac{T_0}{\pi}\int\limits_{\theta_0}^\pi \sqrt{1-\cos\theta}d\theta=\sqrt{2g}\frac{T_0}{\pi}\cdot2\sqrt{2}\cos(\theta_0/2)$

or in terms of $\lambda$, we have,

$\displaystyle L(\theta_0)=2\frac{\cos\theta_0/2}{\sin\lambda/2}$

All of this absolutely amazing. But the story isn't complete yet. There is something more to this tautochrone curve that truly blows our mind which I'll share in the next post. Hope you enjoyed the post.

Mathematica code for the above curve:

Clear["Global`*"];

\[Alpha] = 45 Degree/2;

Phif[\[Theta]_] :=

Cot[\[Alpha]/2] ArcTanh[Sqrt[(Cos[\[Alpha]] - Cos[\[Theta]])/(

1 + Cos[\[Alpha]])]] -

ArcTan[Sqrt[(Cos[\[Alpha]] - Cos[\[Theta]])/(1 - Cos[\[Alpha]])]];

Cartf[\[Theta]_] := {Cos[Phif[\[Theta]]] Sin[\[Theta]],

Sin[\[Theta]] Sin[Phif[\[Theta]]], Cos[\[Theta]]};

Show[Graphics3D[{Opacity[1/5], White, Sphere[]}],

ParametricPlot3D[Cartf[\[Theta]], {\[Theta], 0 Degree, 180 Degree},

PlotStyle -> Thick, ColorFunction -> (Hue[0.8 #4] &)],

ParametricPlot3D[{Cos[t], Sin[t], 0}, {t, 0, 2 Pi}],

ParametricPlot3D[{Cos[t], 0, Sin[t]}, {t, 0, 2 Pi}],

ParametricPlot3D[{0, Cos[t], Sin[t]}, {t, 0, 2 Pi}],

ImageSize -> Large, Boxed -> False]

See here to paste this code into Mathematica with proper formatting.

Until then

Yours Aye

Monday, August 3, 2020

Gergonnes's Solution to Apollonius' Problem

I was recently revisiting my earlier post on A nice result in Geometry and got 'distracted' into the Classical Apollonius problem. The fact that this problem has attracted the interests some of the most famous mathematicians throughout history tells us how beautiful this problem really is.

As always, the wiki page was a great source of information for me and it gives a very nice account on Gergonne's solution to the problem which, needless to say, is ingenious. Meanwhile, I was also consulting Cut-The-Knot's version of Gergonne's solution.

Both the solutions seemed similar but there was one subtle difference. To make it more clear (and readable), I (re)created the diagrams on Wikipedia using Geogebra. Before we dwell further, let's list out the key labels used in the construction.

In what follows , $i \in \{1,2,3\}$ and $X \in \{A,B\}$

$C_1,C_2,C_3$ - The given circles. In general, there are eight circles that are mutually tangent to all these circles
$C_O$ - Circle orthogonal to all the three given circles
$G$ - Radical center of the given circles. Also, center of the Orthogonal circle $C_O$
$C_A,C_B$ - A pair (out of the possible eight) of solution circles
$X_i$ - Points where the solution circle $C_X$ touches Circle $C_i$
$l_i$ - Line passing through $A_i$ and $B_i$. As $A_i$ and $B_i$ are inverses to each other w.r.t to $C_O$, this line also contains the Radical center $G$; also the polar of point $T_i$ defined below
$p$ - (One out of the possible four) Homothetic Axis. This line also happens to be the radical axis of the solution circles
$P_i$ - Pole of the Homothetic axis $p$ w.r.t to circle $C_i$
$J_i, K_i$ - Points of intersection of the Orthogonal circle $C_O$ and circle $C_i$
$T_i$ - Point of intersection of the line joining $J_i$ and $K_i$, and the Homothetic axis $p$; Also, the pole of line $l_i$ defined above
$O_1,O_2$ - Points of intersection between the Orthogonal circle $C_O$ and Homothetic axis $p$
$C_I$ - Circle with center at $O_1$ and passing through $O_2$

Gergonne's Solution - Wikipedia
(i) Construct $p$, the Homothetic axes (or axes of similitude) (ii) Construct the Radical center $G$ of the three given circles (iii) Construct the poles $P_i$ of the Homothetic axes w.r.t the given circles.

Points $P_i$ are the objects of interest in Wikipedia version of Gergonne's Solution

Now the line joining $G$ and $P_i$ cuts the circle $C_i$ in $A_i$ and $B_i$ giving six points in total. These six points determine the solution circles.

Gergonne's Solution - Cut-The-Knot
(i) Construct $p$, the Homothetic axes (or axes of similitude) (ii) Construct the circle orthogonal to the given circles $C_O$ (iii) Construct the Radical axis of the Orthogonal circle and each of the given circles (that is, the line joining $J_i$ and $K_i$); this intersects the Homothetic axes in $T_i$.

Points $T_i$ are the objects of interest in Cut-The-Knot version of Gergonne's Solution

Now construct tangent from $T_i$ to each of the given circles.These points then determine the solution circles.

The third step in these solutions were the source of my confusion. Especially, while Wikipeida's version needs an inversion (constructing the poles $P_i$), Cut-the-Knot's version needs no inversion at all. This baffled me and it took me some time (two days to be exact) to figure out how this was possible.

The explanation given in Cut-The-Knot did not seem to clarify this point. It quotes Four Touching Circles IV, which does not add anything in relation to the third step in the construction.

Gergonne's solution relies crucially on the inversive relation between poles and polars of conics. In particular, it exploits the fact that if the pole $P$ of a line $p$ w.r.t a conic lies on a line $q$, then the pole $Q$ of line $q$ w.r.t the same conic lies on line $p$. Wikipedia version (and most descriptions of Gergonne's solution) makes use of this relation.

Now, Cut-The-Knot's solution uses the mirror image of the same relation. That is, if the polar $p$ of a point $P$ contains a point $Q$, then the polar $q$ of point $Q$ w.r.t the same conic, contains the point $P$. And hence, it cleverly avoids any explicit inversion construction. Here's how it works.

From Wikipedia, we know that point $T_i$ is the pole of the (polar) line $l_i$ w.r.t circle $C_i$. Now the polar of point $T_i$ contains the point $G$. Hence, the polar of $G$ must contain $T_i$.

Here's comes the nice part. Given any pair of orthogonal circles, the polar of center of either circle w.r.t to the other circle is their radical axis. This is very obvious from the construction of polars and orthogonal circles.

So in essence, we are actually constructing a polar (rather than a pole in Wikipedia's version) of point. However, because of the clever choice, it just happens to be the point of intersection of circles already constructed avoiding the need for any inversion.

That was indeed neat. Hope you agree.

This whole process of revisiting one of the classical problems in Geometry was really nice experience for me. I greatly enjoyed creating the Geogebra drawing of the solution. In fact, I came up with something on my own.

Whenever, the Homothetic axis $p$ meets the Orthogonal circle $C_O$, something very interesting happens. As the solution circles are inverses of each other under $C_O$, the points of intersection between $p$ and $C_O$ also happens to be the points of intersection between $C_A$ and $C_B$.

Now draw a circle $C_I$ with center at one of the points of intersection (Point $O_1$) and passing through the other (Point $O_2$). Use this as a circle of inversion.

As the Orthogonal circle passes through the center of inversion, it becomes a straight line $C_O'$.
The three given circles, being orthogonal to the $C_O$, transforms into circles with their center on the line above.
The solution circles too passes through the inversion center and they too transform into straight lines ($C_A'$ and $C_B'$).
The (other) point of intersection $O_2$ of the solution circles passes through the inversion circle $C_I$ and hence this point is where the solution lines $C_A'$ and $C_B'$ meet. That is, point $O_2$ becomes the Homothetic center of the (inverted) given circles.

The given circles (not shown here) inverted under $C_I$ form a set of Homothetic circles with center $O_2$.
The Orthogonal circles and both the solution circles transforms to straight lines.

In summary, if $p$ meets $C_O$, we can construct a circle $C_I$ with center at $O_1$ and passing through $O_2$. Invert any of the given circle and draw tangent lines from $O_2$ to the inverted circle. Inverting back these tangent lines gives the required solution circles.

While constructing inversion is harder (needs more steps) while doing manually, this method is relatively simpler if we have an 'macro' for inversion (as in Computer graphics systems like Geogebra).

But the main drawback about this method is that it only works when the Homothetic axes intersects the Orthogonal circle which, sadly, is not guaranteed when the Homothetic axis contains two external Homothetic centers. Even in this case, we can actually develop a solution using the concept of mid-circles. This is not complicated but is more laborious. Gergonne's solution can be used in these cases.

That's all for today. Hope you enjoyed it.

Until then
Yours aye
Me

Wednesday, June 17, 2020

Error terms of the Exponential Function's limit form

One of the classical characterizations of the Exponential function is to define it by the following limit.

$e^x=\displaystyle\lim_{n \to \infty}\left(1+\frac{x}{n}\right)^n$

Characterizations of the exponential function gives the following as the first few error terms of this limi-expression,

$\displaystyle\left(1+\frac{x}{n}\right)^n=e^x\left(1-\frac{x^2}{2n}+\frac{x^3(8+3x)}{24n^2}-\cdots\right)$

stating that the numerators of the $n^k$ terms are polynomials in $x$ of degree $2k$. However, no mention is made about the computation of the coefficients or their relation with each other (if there is any).

That exactly is the point of this post. We try to derive an expression which allows us to calculate the coefficients as we desire. Considering that this definition of the Exponential function is all over the internet, I was disappointed that I could not find any reference regarding these coefficients.

Based on Wikipedia's expression, we suspect that,

$\displaystyle e^{-x}\left(1+\frac{x}{n}\right)^n=\sum_{i\geq j \geq 0}(-1)^{i}p_{i,j}\frac{x^{i+j}}{n^i}$

where we seek to compute the coefficients $p_{i,j}$. Now using the substitutions $x \to -nx$ and $n \to -1/n$ (in that order), we can see that,

$e^{-x/n}(1-x)^{-1/n}=\displaystyle\sum_{i\geq j \geq 0}p_{i,j}\frac{x^{i+j}}{n^j}$

But $e^{-x/n}(1-x)^{-1/n}=e^{-(x+\ln{(1-x)})/n}=\displaystyle\sum_{j=0}^\infty(-x+\ln{(1-x)^{-1}})^j\frac{1}{n^jj!}$

Comparing the coefficients of powers of $n$, we have,

$x^jp_j(x)=\displaystyle x^j\sum_{i=0}^\infty p_{i,j}x^i=\frac{1}{j!}(-x+\ln{(1-x)^{-1}})^j$

This readily gives a combinatorial representation. That is, $(i+j)!p_{i,j}$ is the number of derangements of $[1,2,3,\cdots,i+j]$ with exactly $j$ cycles. Therefore,

$\displaystyle (i+j)!p_{i,j}=\sum_{k=0}^j (-1)^{j-k}\binom{i+j}{i+k}\begin{bmatrix}i+k\\k\end{bmatrix}=\sum_{k=0}^j (-1)^{i+j-k}\binom{i+j}{i+k}s(i+k,k)$

where $\begin{bmatrix}n\\k\end{bmatrix}$(and $s(n,k)$) are the unsigned (and signed) Stirling numbers of the first kind.

Also, using the same derangement interpretation, we can easily see that,

$\displaystyle \sum_{j=0}^i i!p_{i-j,j}=i!\sum_{j=0}^i\frac{(-1)^j}{j!}\implies \sum_{j=0}^i p_{i-j,j}=\sum_{j=0}^i\frac{(-1)^j}{j!}$

On the other hand, we can derive a recurrence for $p_{i,j}$'s by manipulating $x^jp_j(x)$.

$jxp_j(x)=p_{j-1}(x)(-x+\ln(1-x)^{-1})=\displaystyle p_{j-1}(x)\left(\frac{x^2}{2}+\frac{x^3}{3}+\frac{x^4}{4}+\cdots\right)$

Comparing coefficients of powers of $x$, we have,

$p_{i,j}=\displaystyle\frac{1}{j}\sum_{k=2}^{i+1}\frac{1}{k}p_{i-k+1,j-1}=\frac{1}{j}\sum_{k=0}^{i-1}\frac{p_{k,j-1}}{i-k+1}$

Together with $p_{0,0}=1$ and $p_{i,j}=0$ when $i<j$ (or) $j=0$ enables us to calculate all the coefficients.

Further, we can differentiate $x^jp_j(x)$ to get a different recurrence.

$\displaystyle p_{i,j}=\frac{1}{i+j}p_{i-1,j-1}+\left(1-\frac{1}{i+j}\right)p_{i-1,j}$

This is really magical as we now have a probabilistic interpretation for the $p_{i,j}$'s.

Consider a bot that is initially at point $(i,j)$ with $i\geq j$ in the $x$-$y$ plane. The bot is programmed to move towards the origin such that at each step it either moves down-left to $(i-1,j-1)$ with some probability or moves left to point $(i-1,j)$ with the complementary probability.

If the probability of moving down-right is the inverse of the Manhattan distance between the bot's current position and the origin, then $p_{i,j}$ is the probability that bot reaches the origin without ever touching the $x$-axis or crossing the $x=y$ line.

This is simply wonderful as there does not seem to be a connection between the above probability and the error terms of the exponential function. Again the recurrence with the same boundary conditions as before gives us all the $p_{i,j}$'s.

We conclude the post with more terms of the error terms.

$\displaystyle\left(1+\frac{x}{n}\right)^n=e^x\left(1-\frac{x}{2}\frac{x}{n}+\left(\frac{x}{3}+\frac{x^2}{8}\right)\frac{x^2}{n^2}-\left(\frac{x}{4}+\frac{x^2}{6}+\frac{x^3}{48}\right)\frac{x^3}{n^3}+\left(\frac{x}{5}+\frac{13x^2}{72}+\frac{x^3}{24}+\frac{x^4}{384}\right)\frac{x^4}{n^4}-\cdots\right)$

Noting the linear term in $n$, I had the idea of using the generalized Binomial series to get the following alternate limit form. This is a better form of the limit expression as the resulting asymptotic series does not have linear term either in $n$ or in $x$.

$e^x=\displaystyle\lim_{n \to \infty}\left(1+\frac{x}{n}\right)^{n+x/2}$

That's all for today. Hope you enjoyed it.

Until then
Yours Aye
Me