Friday, January 28, 2022

Matt Parker and the London Lottery

In his (aptly) named video 'Matt Explains: The Lottery', Matt explains the probabilities associated with the London lottery. In the first video he made on this topic, he worked out the probability of winning the Jackpot with the new system where we have a chance to win an extra lottery.

To make things clear, here are the winnings that I understood from his video. Six numbers are drawn (without repetition) from a pool of 59 numbers. If all 6 numbers match, then we win the Jackpot. If we match 5 numbers, then we get a bonus number whose match win us 50000 pounds and 1000 pounds otherwise. If we match 4 numbers, we win a 100 pounds and 3 matches wins us 25 pounds. Finally, under the new system, matching 2 numbers gives us no cash prize but gives us a new lottery which we can use the next time.

The number of matches are given by the Hypergeometric distribution as explained (and calculated) by Matt in the video. In his first video, Matt ignores all cash prizes and focuses only on the Jackpot and the extra lottery.

If we let $p_k$ denote the probability of $k$ matches and $p$ denote the probability of ultimately winning the Jackpot, we have

$p=p_6+p_2p$ which upon simplification gives $\displaystyle p = \frac{p_6}{1-p_2}$

Substituting the known values, gives a probability of $p=2.459111 \times 10^{-8}$ (or) an Odds of 1 in 40665098.

As we will be dealing with extremely small probabilities, let's define 1 micro-bps as a millionth of a basis point. In this sense, the probability in the above case is approx. 245.91 micro-bps.

At this point, as per a Subscriber's suggestion, Matt tries to attack the problem where we repeatedly buy lotteries using all our winnings until we win the Jackpot. Here Matt calculates the expected number of lotteries won with a given lottery.

If we let $l$ denote the expected number of lotteries won per lottery, then

$l=p_2 \cdot 1 + p_3 \cdot 12.5 + p_4 \cdot 50 + p_{5,1} \cdot 500 + p_{5,2} \cdot 25000 \approx 0.257191$

where we have used the fact that each lottery cost 2 pounds. Matt now uses the following equation (indirectly), to get the probability of winning the Jackpot.

$p=p_6+l \cdot p \tag{1} \label{wrongeqn}$ which gives $\displaystyle p = p_6/(1-l)$.

Using the known values, we get $p=298.7830655$ micro-bps (or) an Odds of about 1 in 33469098.

But something is definitely amiss in $\eqref{wrongeqn}$. For example, we can easily construct a case where $l > 1$ which will then give negative probabilities. What then is the error, if any, in Matt's reasoning? Or is there a hidden assumption that was not explicitly stated here?

To understand whats going on here, we should use idea that the probability of getting at least one win in $n$ trials is $1-(1-p)^n$ and interpret $p$ as the probability of winning atleast one Jackpot. Using this, a better expression is given by,

$p=p_6+(1 - p_6)(1 - (1 - p)^\bar{l}) \tag{2} \label{crudeeqn}$

where $\bar{l}=l/(1-p_6)$ is the expected number of lotteries won conditional on not winning the Jackpot directly. The above equation then means that we either win the Jackpot or get some lotteries to continue our pursuit of winning the Jackpot. Not so surprisingly, this equation can be solved in closed form. We have,

$\displaystyle p = 1 - (1 - p_6)^{1 / (1 - \bar{l})} \tag{3} \label{cruderes}$

This gives a probability of $p=298.7830666$ micro-bps (or) an Odds of about 1 in 33469097.

If we assume $p \approx 0$ in $\eqref{crudeeqn}$, then using the approximation $(1+x)^n \approx nx$ for small $x$, we end up with Matt's result. This clearly shows the hidden assumption in result derived by Matt.

But even $\eqref{crudeeqn}$ is still not exactly correct. This can still result in a value that is less than 0 or greater than 1 in certain cases which doesn't make any sense.

This problem occurs because, in the RHS of $\eqref{crudeeqn}$, we casually make an assumption that $\mathbb{E}(1-(1-p)^L)=1-(1-p)^{\mathbb{E}(L)}$ where $L$ is the random number of lotteries won (conditional on not winning the Jackpot). But this is clearly wrong because of Jensen's Inequality.

Therefore, the correct way to do this would be to solve the following equation.

$p=p_6+p_{5,2}(1-(1-p)^{25000})+p_{5,1}(1-(1-p)^{500})+p_4(1-(1-p)^{50})+p_3(1-(1-p)^{12.5})+p_2(1-(1-p))$

where we explicitly condition on the resulting coupons. Solving this, we get $p=298.782539$ micro-bps (or) an Odds of about 1 in 33469156. 

This is good but there is still one final piece of hiccup here. In case of 2 matches, we get 25 pounds which we have assumed is equal to 12.5 lotteries. But non-integer tickets doesn't make much sense.

In case 2 matches wins us 24 pounds, then we would have $p=296.70581679$ micro-bps (or) an Odds of about 1 in 33703416. Had it been 26 pounds, then we would have $p=300.888537999$ micro-bps (or) an Odds of about 1 in 33234897.

The precise way to model this is to split the problem into cases: one in which we have just a lottery and the other in which we have the lottery and a pound. If we let $p$ and $p_e$ be the probabilities of winning at least one Jackpot in the respective cases, then we have

$p=p_6+p_{5,2}(1-(1-p)^{25000})+p_{5,1}(1-(1-p)^{500})+p_4(1-(1-p)^{50})+p_3(1-(1-p_e)(1-p)^{11})+p_2(1-(1-p)) \tag{4} \label{firstsimul}$
and
$p_e=p_6+p_{5,2}(1-(1-p_e)(1-p)^{24999})+p_{5,1}(1-(1-p_e)(1-p)^{499})+p_4(1-(1-p_e)(1-p)^{49})+p_3(1-(1-p)^{13})+p_2(1-(1-p_e)) \tag{5} \label{secondsimul} $

Solving this simultaneous equations gives, $p=296.75282794$ micro-bps (or) an Odds of about 1 in 33698077 and $p_e=300.13462527$ micro-bps (or) an Odds of about 1 in 33318380.

These equations have to be solved numerically which may be compute-intensive. A compromise would be let Jensen's inequality slide and consider the following equations.

$p=p_6+(1-p_6)(1-(1-p)^{\bar{l}_{11}}(1-p_e)^{\bar{l}_{12}})$ and $p_e=p_6+(1-p_6)(1-(1-p)^{\bar{l}_{21}}(1-p_e)^{\bar{l}_{22}})$

where $l_{11}=25000p_{5,2}+500p_{5,1}+50p_4+11p_3+p_2$, $l_{12}=p_3$, $l_{21}=24999p_{5,2}+499p_{5,1}+49p_4+13p_3$ and $l_{22}=p_{5,2}+p_{5,1}+p_4+p_2$ are the (unconditional) expected lotteries and lottery-pounds in the respective cases, and $\bar{l}_k=l_k/(1-p_6)$ for $k \in \{11, 12, 21, 22\}$. 

Like before, this has the advantage of being solved in closed form. We have,

$p=1-(1-p_6)^{\bar{m}}$ and $p_e=1-(1-p_6)^{\bar{m}_e}$

where $\displaystyle \bar{m} = \frac{1-\bar{l}_{22}+\bar{l}_{12}}{(1-\bar{l}_{11})(1-\bar{l}_{22})-\bar{l}_{12}\bar{l}_{21}}$ and $\displaystyle \bar{m}_e=\frac{1-\bar{l}_{11}+\bar{l}_{21}}{(1-\bar{l}_{11})(1-\bar{l}_{22})-\bar{l}_{12}\bar{l}_{21}}$

Using the known values, we get $p=296.7533437$ micro-bps and $p_e=300.1351482$ micro-bps. Like before, if we assume $p_6$ is also small, then we get an even simpler linear relation.

Or we could assume both $p$ and $p_e$ are small in $\eqref{firstsimul}$ and $\eqref{secondsimul}$, which will give us a pair of linear simultaneous equations. Solving them would give us,

$\displaystyle p = mp_6$ and $\displaystyle p_e = m_ep_6$

where $m$ (and $m_e$) is the same as $\bar{m}$ (and $\bar{m}_e$) except with unconditional expectations. Either way, these approximations become susceptible to absurd probabilities if not applied carefully.

Now, for the next part, let's say we solve everything exactly. If we have 4 pounds, then we could buy 2 lotteries. If we assume that we don't mix the proceeds from on with the other, then the probability of winning atleast one Jackpot is $1-(1-p)^2$ which with small value approximation gives $2p$. Similarly, With 5 pounds, the probability of winning atleast one Jackpot is $p+p_e$.

But with 6 pounds, we have a choice. We could either play three individual lotteries with winning probability $3p$ or two lottery-pounds with winning probability $2p_e$. With the values at hand, playing three individual lotteries is the clear choice.

This shows that it is better to convert all cash into lotteries to the fullest extent possible. Therefore, for integer $n$, we have,

$p_{2n}=np$ and $p_{2n+1}=(n-1)p+p_e$

All of the above were calculated using the following Mathematica code.

Clear["Global`*"];
e1 = Binomial[6, 1] Binomial[53, 5] / Binomial[59, 6];
e2 = Binomial[6, 2] Binomial[53, 4] / Binomial[59, 6];
e3 = Binomial[6, 3] Binomial[53, 3] / Binomial[59, 6];
e4 = Binomial[6, 4] Binomial[53, 2] / Binomial[59, 6];
e51 = Binomial[6, 5] Binomial[53, 1] / Binomial[59, 6] * 52 / 53;
e52 = Binomial[6, 5] Binomial[53, 1] / Binomial[59, 6] * 1 / 53;
e6 = Binomial[6, 6] Binomial[53, 0] / Binomial[59, 6];

f[p_] := e6 + e52 (1 - Power[1 - p, 25000]) + e51 (1 - Power[1 - p, 500]) + e4 (1 - Power[1 - p, 50]) + e3 (1 - Power[1 - p, 25 / 2]) + e2 p - p;
g[p1_, p2_] := {e6 + e52 (1 - Power[1 - p1, 25000]) + e51 (1 - Power[1 - p1, 500]) + e4 (1 - Power[1 - p1, 50]) + e3 (1 - (1 - p2) Power[1 - p1, 11]) + e2 p1,
e6 + e52 (1 - (1 - p2) Power[1 - p1, 24999]) + e51 (1 - (1 - p2) Power[1 - p1, 499]) + e4 (1 - (1 - p2) Power[1 - p1, 49]) + e3 (1 - Power[1 - p1, 13]) + e2 p2};

(* NSolve[p == e6 + e52 (1 - Power[1 - p, 25000]) + e51 (1 - Power[1 - p, 500]) + e4 (1 - Power[1 - p, 50]) + e3 (1 - Power[1 - p, 12]) + e2 p && 0 < p < 1, p] 
NSolve[p == e6 + e52 (1 - Power[1 - p, 25]) + e51 (1 - Power[1 - p, 5]) + e4 (1 - Power[1 - p, 3]) + e3 (1 - Power[1 - p, 2]) + e2 p && 0 < p < 1, p] *)

(* n = 1; nmax = 1000;
a = N[0, 50]; b = N[1 / 1024 / 1024, 50]; tol = Power[10, -75];
While[n <= nmax,
    c = (a + b) / 2; temp = f[c];
    If[Or[(b - a) / 2 < tol, Abs[temp] < tol tol], Break[];];
    n += 1;
    If[temp > 0, a = c;, b = c;];
];
c *)

(* With only Jackpot and extra lottery *)
p = e6 / (1 - e2);
N[p, 15]
Floor[(1 - p) / p]

(* Matts Solution *)
l = e2 + (25 / 2) e3 + 50 e4 + 500 e51 + 25000 e52;
p = e6 / (1 - l);
N[p, 20]
Floor[(1 - p) / p]

(* Ignoring Jensens *)
lb = l / (1 - e6);
p = 1 - Power[1 - e6, 1 / (1 - lb)];
N[p, 20]
Floor[(1 - p) / p]

(* FindRoot[f[p1], {p1, 1}, WorkingPrecision -> 25] *)

(* With 25 units *)
p = 2.9878253945 Power[10, -8];
N[p, 20]
Floor[(1 - p) / p]

(* With 24 units *)
p = 2.9670581679 Power[10, -8];
N[p, 20]
Floor[(1 - p) / p]

(* With 26 units *)
p = 3.00888537999 Power[10, -8];
N[p, 20]
Floor[(1 - p) / p]

(* With 25 units but with extra lottery*)
(* FindRoot[g[p1, p2] == {p1, p2}, {p1, 1}, {p2, 1}, WorkingPrecision -> 25] *)
p = 2.9675282794 Power[10, -8];
Floor[(1 - p) / p]
p = 3.0013462527 Power[10, -8];
Floor[(1 - p) / p]

(* Ignoring Jensens *)
l11 = 25000 e52 + 500 e51 + 50 e4 + 11 e3 + e2; l12 = e3;
l21 = 24999 e52 + 499 e51 + 49 e4 + 13 e3;l22 = e52 + e51 + e4 + e2;
l11b = l11 / (1 - e6); l12b = l12 / (1 - e6); l21b = l21 / (1 - e6); l22b = l22 / (1 - e6);
bdeno = (1 - l11b) (1 - l22b) - l12b l21b;
mb = (1 - l22b + l12b) / bdeno; meb = (1 - l11b + l21b) / bdeno;
p = 1 - Power[1 - e6, mb];
N[p, 15]
Floor[(1 - p) / p]
pe = 1 - Power[1 - e6, meb];
N[pe, 15]
Floor[(1 - pe) / pe]

(* Small value approximations *)
deno = (1 - l11)(1 - l22) - l12 l21;
N[e6 (1 + (l12 - l22)) / deno, 10];
N[e6 (1 + (l21 - l11)) / deno, 10];
m = (1 - l22 + l12) / deno;
me = (1 - l11 + l21) / deno;
N[e6 m, 10]
N[e6 me, 10]

(* temp[n_] := temp[n] = If[n == 2, p, pe];
Calc[n_] := Module[{res = {}, m = IntegerPartitions[n, All, {2, 3}]},
    If[n < 2, Return[0];];
    Do[
        res = Join[res, {1 - Times @@ (1 - temp /@ k)}];
    , {k, m}];
    Max[res]
];
Power[10, 10] Table[Calc[n], {n, 30}] *)


Hope you enjoyed the discussion. See you in the next post.

Yours Aye
Until then
Me