Friday, April 30, 2021

A Probability problem on an Election result

A friend of mine recently posed the following problem to me: Given two contestants in an election $X$ and $Y$ (with equal popularity among the voters), what is the probability that the contestant leading the election after 80% of polling eventually loses the election?

I misunderstood the question in ways more than one and, not wanting to use paper-pencil, solved this problem with an answer of $(2/\pi)\tan^{-1}4$ which was wrong.

I enquired with him the source of the problem which also has a solution. But the solution there seems very convoluted and needs Wolfram Alpha to get a closed form. Seeing the solution, I realized that I have misunderstood the question and that my method is a lot easier.

Let $X_1,Y_1$ be the votes received by the contestants in after 80% of polling respectively and $X_2,Y_2$ be the votes respectively in the remaining 20% polling. For the sake of simplicity, let the total number of votes be $20n$ for a large $n$.

We know that, if $X\sim \text{Bin}(m,p)$, then for large $m$, $X$ is approximately distributed as $\mathcal{N}(mp,mpq)$. Therefore, for $p=q=1/2$,

$\displaystyle X_1,Y_1\sim \mathcal{N}\left(\frac{16n}{2}, \frac{16n}{4}\right)$ and $\displaystyle X_2,Y_2 \sim \mathcal{N}\left(\frac{4n}{2}, \frac{4n}{4}\right)$

Let $E$ be the event denoting the player trailing after 80% polling eventually wins the election. Then,

$\displaystyle \mathbb{P}(E)=2\cdot \frac{1}{2}\cdot \mathbb{P}(X_1+X_2 \leq Y_1+Y_2 \text{ and }X_1 \geq Y_1)$

We can rewrite the same to get

$\mathbb{P}(E)=\mathbb{P}(X_1-Y_1 \leq Y_2 - X_2 \text{ and }X_1-Y_1 \geq 0)$

We also know that if $U\sim \mathcal{N}(\mu_1,\sigma_1^2)$ and $V\sim \mathcal{N}(\mu_2,\sigma_2^2)$, then $aU+bV\sim \mathcal{N}(a\mu_1+b\mu_2,a^2\sigma_1^2+b^2\sigma_2^2)$

Therefore, $X_1-Y_1 \sim \mathcal{N}\left(0,8n\right)=\sqrt{8n}Z_1$ and $X_2-Y_2\sim \mathcal{N}\left(0,2n\right)=\sqrt{2n}Z_2$

where $Z_1$ and $Z_2$ are standard Normal variables.

Therefore,

$\begin{align}\displaystyle \mathbb{P}(E)&=\mathbb{P}(2\sqrt{2n}Z_1\leq \sqrt{2n}Z_2 \text{ and } 2\sqrt{2n}Z_1 \geq 0)\\ &=\mathbb{P}(2Z_1\leq Z_2 \text{ and } Z_1 \geq 0) \\ &= \mathbb{P}(Z_1 \leq Z_2/2 \text{ and }Z_1 \geq 0)\\ &=\mathbb{P}(0 \leq Z_1 \leq Z_2/2) \\ &= \mathbb{P}\left(0 \leq \frac{Z_1}{Z_2} \leq \frac{1}{2}\right)\\ &= \mathbb{P}\left(0 \leq W \leq \frac{1}{2}\right)\\ \end{align}$

where $W$ is the ratio of two standard Normal distributions and hence a Cauchy random variable. As the CDF of a Cauchy variable has a closed form in terms of arctangent function, we finally have

$\displaystyle \mathbb{P}(E)=\frac{1}{\pi}\tan^{-1}\left(\frac{1}{2}\right)\approx 0.1475$

If $E$ denotes the event that the contestant trailing when the fraction $a$ of votes remains to counted eventually wins the election. then

$\displaystyle \mathbb{P}(E)=\frac{1}{\pi}\tan^{-1}\left(\sqrt{\frac{a}{1-a}}\right)=\frac{1}{\pi}\sin^{-1}\sqrt{a}$

Hope you enjoyed the discussion. See ya in the next post.


Until Then
Yours Aye
Me

Saturday, April 10, 2021

Expected Value in terms of CDF

It is well known that the expectation of a non-negative random variable $X$ can be written as

$\displaystyle \mathbb{E}[X] \overset{\underset{\mathrm{d}}{}}{=} \sum_{k=0}^\infty \mathbb{P}(X>k)  \overset{\underset{\mathrm{c}}{}}{=} \int\limits_0^\infty \mathbb{P}(X>x)\,dx$

for the discrete and continuous cases.

It's quite easy to prove this, at least in the discrete case. It is interesting that, in the same vein, this can be extended to arbitrary functions. That is,

$\begin{align} \displaystyle \mathbb{E}[g(X)]&=\sum_{k=0}^\infty g(k)\mathbb{P}(X=k)\\ &=\sum_{k=0}^\infty \Delta g(k)\mathbb{P}(X>k)\\ \end{align}$

where $\Delta g(k)=g(k+1)-g(k)$ is the forward difference operator. WLOG, We also have made an assumptions that $g(0)=0$.

Comparing this with continuous case, we can make an 'educated guess' that

$\displaystyle \mathbb{E}[g(X)]=\int\limits_0^\infty \mathbb{P}(X>x)\,dg(x)$

We made a post about estimating a sum with probability where we showed the expected error in the approximation is given by

$\displaystyle \mathbb{E}(\delta)=\sum_{k=1}^\infty f(k)\frac{\binom{n-k}{m}}{\binom{n}{m}}$

Note that the term involving the ratio of binomial coefficients can be interpreted as the probability of the minimum-of-the-$n$-tuple being greater than $k$. Therefore,

$\displaystyle \mathbb{E}(\delta)=\sum_{k=1}^\infty f(k)\mathbb{P}(Y>k)$

where $Y$ denotes the smallest order statistic.

Comparing this with our expression for expectation, we see that the expected value of the (probabilistic) Right Riemann sum is

$\displaystyle \mathbb{E}[\text{Right Riemann sum}]  \overset{\underset{\mathrm{d}}{}}{=} \mathbb{E}\left[\sum_{j=Y}^n f(j)\right]  \overset{\underset{\mathrm{c}}{}}{=} \mathbb{E}\left[ \int\limits_Y^1 f(x)\,dx \right]$

Without going into further calculations, I'm guessing that

(i) $\displaystyle \mathbb{E}[\text{Left Riemann sum}]  \overset{\underset{\mathrm{d}}{}}{=} \mathbb{E}\left[\sum_{j=0}^Z f(j)\right]  \overset{\underset{\mathrm{c}}{}}{=} \mathbb{E}\left[ \int\limits_0^Z f(x)\,dx \right]$

(ii) $\displaystyle \mathbb{E}[\text{error in Trapezoidal sum}]  \overset{\underset{\mathrm{d}}{}}{=} \frac{1}{2}\mathbb{E}\left[\sum_{j=Y}^Z f(j)\right]  \overset{\underset{\mathrm{c}}{}}{=} \frac{1}{2}\mathbb{E}\left[ \int\limits_Y^Z f(x)\,dx \right]$

where $Z$ denotes the largest order statistic.

Hope you enjoyed the discussion. See ya in the next post.


Until then
Yours Aye
Me