## Thursday, December 5, 2019

### Bitwise Transforms

Bitwise operators have always fascinated me. There are a number of cool tricks and tips that one can use in programming to shorten the code and even elegant, I would say. To start with, we have Bit twiddling hacks that contains a huge list. I've used some of them in my Nim Multiplication code.

But this code is not about using those tricks. Bitwise operations are interesting from mathematical viewpoint as well. For example, the one that has always stumped me was the Hadamard Transform that are intimately connected with the Bitwise XOR. In one of their wide variety of applications, they play a great role in XOR transforms.

Diving straight into the point, if we have two arrays $\mathbf{a}=(a_0,a_1,a_2,\cdots)$ and $\mathbf{b}=(b_0,b_1,b_2,\cdots)$ and we define the convolution as an array $\mathbf{c}=(c_0,c_1,c_2,\cdots)$ such that,

$c_n=\displaystyle\sum_{i*j=n}a_ib_j$

where '*' is any binary operation. For example, when the binary operation is 'addition', then we have the most commonly used Discrete convolution. If the operation is 'multiplication', we have what we call the Dirichlet convolution.

We can also the Bitwise operations as the binary operation. Most famous of them would be the 'Bitwise XOR' which results in the Dyadic convolution. Similarly, we can define OR convolution and AND convolution using Bitwise-OR and Bitwise-AND as the binary operations respectively.

If evaluated directly, all these convolutions would lead to $O(n^2)$ time complexity. A common technique in making this task easier is defining an easily invertible matrix $\mathbf{M}$ such that

$\mathbf{M}\mathbf{c}=(\mathbf{M}\mathbf{a})\odot(\mathbf{M}\mathbf{b})$

where $\odot$ is the element-wise multiplication. For example, $\mathbf{M}$ is the Vandermonde matrix in case of the Discrete convolution.

For the Dyadic convolution (or the XOR convolution), the matrix we are looking for are the Hadamard matrices, $H$. We can construct the matrices with $H=(1)$ and iteratively constructing the higher order matrices with either of the following two matrices

$H \to \begin{pmatrix} H & H \\ H & -H \end{pmatrix}$ (or) $H \to \begin{pmatrix} H & -H \\ H & H \end{pmatrix}$

though the first seems to the most popular choice. Hadamard matrices are very simple in that they are their own inverse which makes inverse part just like the transform. In my opinion, this is very analogous to how XOR can as both addition and subtraction.

The idea in this post is to find the relevant matrices for OR and AND convolutions. Fiddling with the first few terms of the matrix equation we saw before with $\mathbf{c}$ defined as OR/AND accordingly, it is not so hard to come with the following.

$\mathbf{M}_{\text{OR}} \to \begin{pmatrix} \mathbf{M}_{\text{OR}} & \mathbf{0} \\ \mathbf{M}_{\text{OR}} & \mathbf{M}_{\text{OR}} \end{pmatrix}$ (or) $\mathbf{M}_{\text{OR}} \to \begin{pmatrix} \mathbf{M}_{\text{OR}} & \mathbf{M}_{\text{OR}} \\ \mathbf{M}_{\text{OR}} & \mathbf{0} \end{pmatrix}$

and

$\mathbf{M}_{\text{AND}} \to \begin{pmatrix} \mathbf{M}_{\text{AND}} & \mathbf{M}_{\text{AND}} \\ \mathbf{0} & \mathbf{M}_{\text{AND}} \end{pmatrix}$ (or) $\mathbf{M}_{\text{AND}} \to \begin{pmatrix} \mathbf{0} & \mathbf{M}_{\text{AND}} \\ \mathbf{M}_{\text{AND}} & \mathbf{M}_{\text{AND}} \end{pmatrix}$

We can choose either of the matrices for OR/AND. I prefer the first matrices in both cases mostly because the first versions have determinant 1.

Having done that, we follow up with the inverse matrices, $\mathbf{N}_{\text{OR}}$ and $\mathbf{N}_{\text{AND}}$, which can also be iteratively built.

$\mathbf{N}_{\text{OR}} \to \begin{pmatrix} \mathbf{N}_{\text{OR}} & \mathbf{0} \\ -\mathbf{N}_{\text{OR}} & \mathbf{N}_{\text{OR}} \end{pmatrix}$ and   $\mathbf{N}_{\text{AND}} \to \begin{pmatrix} \mathbf{N}_{\text{AND}} & -\mathbf{N}_{\text{AND}} \\ \mathbf{0} & \mathbf{N}_{\text{AND}} \end{pmatrix}$

Now these can be used to answer questions like finding the number of sequences containing numbers from $0$ to $k$ with a given OR-sum (or AND-sum).

I think one of the reason that the Dyadic (XOR) convolution is more famous is because of their application in nim games. Unfortunately, OR and AND convolutions does not have any direct applications that I'm aware of. Still, I found these beautiful and worth sharing.

Just to conclude the story, to solve the LCM-convolution between two functions, we dirichlet-convolve each of them with the constant function. Now we simply do a pointwise multiplication and dirichlet-convolve the result with Mobius function. This is well explained in this stackexchange question and Multiplicative Arithmetic Functions of SeveralVariables: A Survey. These also discuss GCD-convolution but I don't understand them yet.

UPDATE: I always knew that I may not be the first ones to come up with the corresponding matrices for OR and AND convolution but I was a little disappointed when I found Serbanology when I was reading All the good tutorials found for Competitive programming.

Until then
Yours Aye
Me

## Monday, October 14, 2019

### Nim Multiplication

I was recently working on a problem that need the idea of nim-multiplication. While nim-addition plays a nice role in all 1D combinatorial games, nim-multiplication takes its place in their 2D counterparts. An excellent read on the subject of Impartial Combinatorial Games can be found in Thomas S. Ferguson's Game Theory.

Now, back to nim-multiplication. While nim-addition, denoted by $\oplus$, is easy to calculate in a computer with a simple BitXOR operation, nim-multiplication, denoted by $\otimes$, is not as simple. Every source I found in the Internet listed the following two rules to simplify nim-multiplication.

• Rule 1: The nim-product of a Fermat power and any smaller number is just their ordinary product.
• Rule 2: The nim-product of a Fermat power with itself is 3/2-th of the Fermat power.
These two rules, in addition to the fact that nim-product is distributive w.r.t nim-addition and the commutative property of nim-multiplication, certainly simplifies the task.

To further simplify the task, we can make use of the following two additional rules.
• Rule 3: For any sequence of non-negative integers $p_0<p_1<p_2<\cdots$, we have $2^{2^{p_0}}\otimes2^{2^{p_1}}\otimes2^{2^{p_2}}\otimes\cdots=2^{2^{p_0}}\cdot2^{2^{p_1}}\cdot2^{2^{p_2}}\cdot\cdots$. This also shows that any power of 2 can be written as nim-product of distinct Fermat powers by writing the exponent as sums of powers-of -2.
• Rule 4:If $x\text{ AND }y=0$, then $x\oplus y=x + y$
• Rule 5: If $x\text{ AND }y=0$, then $2^x\otimes 2^y=2^x\cdot2^y$
Rule 3 works because we can repeatedly apply Rule 1 for nim-products starting from the left and convert them to ordinary products. Note that any resulting term on the left will always be smaller than the term that comes after.

Rule 4 works because the AND condition makes sure that no bits are identical in the binary representation of $x$ and $y$.

Rule 5 works because the nim-product can be converted to nim-products of Fermat powers. The AND condition makes sure that no two Fermat powers are equal and hence we can sort the Fermat powers and apply Rule 3.

Before solving for the nim-multiplication problem, let's just revisit some properties of the $\text{AND}$ product. It is easy that $\text{AND}$ satisfies the following three properties.

• $x\text{ AND }x=x$
• Associative: $x\text{ AND }(y \text{ AND }z)=(x\text{ AND }y)\text{ AND }z$
• Commutative: $x\text{ AND }y=y\text{ AND }x$

Using these properties, we can show that

• P1: $(x\text{ AND }y)\text{ AND }(x-(x\text{ AND }y))=0$
• P2: $(x\text{ AND }y)\text{ AND }(y-(x\text{ AND }y))=0$
• P3: $(x-(x\text{ AND }y))\text{ AND }(y-(x\text{ AND }y)) = 0$

Now the nim-product of arbitrary numbers can be split into several nim-products of powers-of-2 because
$x\otimes y=(2^{x_0}\oplus2^{x_1}\oplus2^{x_2}\oplus\cdots)\otimes(2^{y_0}\oplus2^{y_1}\oplus2^{y_2}\oplus\cdots)=\displaystyle\sum_{i,j \geq 0}2^{x_i}\otimes 2^{y_j}$
where the summation symbol should be understood as a nim-sum.

Let's first solve for nim-squares of powers-of-2. That is a nim-product of a power-of-2 with itself.

\begin{align}
2^x\otimes 2^x & = (2^{x-(x\text{ AND }-x)}\cdot 2^{x\text{ AND }-x})\otimes (2^{x-(x\text{ AND }-x)}\cdot 2^{x\text{ AND }-x}) \\
& = (2^{x-(x\text{ AND }-x)}\otimes 2^{x\text{ AND }-x})\otimes (2^{x-(x\text{ AND }-x)}\otimes 2^{x\text{ AND }-x})  \\
& = (2^{x\text{ AND }-x}\otimes 2^{x\text{ AND }-x})\otimes (2^{x-(x\text{ AND }-x)}\otimes 2^{x-(x\text{ AND }-x)}) \\
& = (3/2)2^{x\text{ AND }-x}\otimes(2^{x-(x\text{ AND }-x)}\otimes 2^{x-(x\text{ AND }-x)}) \\
\end{align}
We could convert ordinary product in the second equality into an nim-product because the AND product of the exponents can be clearly seen to be zero by using  $y=-x$ in P1.

Now for the nim-product of two distinct powers-of-2,

\begin{align}
2^x\otimes 2^y & = (2^{x-(x\text{ AND }y)}\cdot 2^{x\text{ AND }y})\otimes (2^{y-(x\text{ AND }y)}\cdot 2^{x\text{ AND }y}) \\
& = (2^{x-(x\text{ AND }y)}\otimes 2^{x\text{ AND }y})\otimes (2^{y-(x\text{ AND }y)}\otimes 2^{x\text{ AND }y})  \\
& = (2^{x-(x\text{ AND }y)}\otimes 2^{y-(x\text{ AND }y)})\otimes (2^{x\text{ AND }y}\otimes 2^{x\text{ AND }y}) \\
& = 2^{x-(x\text{ AND }y)}\cdot 2^{y-(x\text{ AND }y)}\otimes (2^{x\text{ AND }y}\otimes 2^{x\text{ AND }y})\\
\end{align}
In the first equality, we have used both P1 and P2 to convert from ordinary product to nim-product. In the last equality, we have used P3 to do the opposite.

The two results above together form the recursion that can be used to solve the nim-multiplication problem. Together with the facts that the nim-product of any number with 0 is 0 and the nim-product of a number with 1 is the number itself, it's not so hard to code the above for nim-multiplication.

UPDATE: My python implementation of the above algorithm.

def memoize(fn):
"""returns a memoized version of any function that can be called
with the same list of arguments.
Usage: foo = memoize(foo)"""

def handle_item(x):
if isinstance(x, dict):
return make_tuple(sorted(x.items()))
elif hasattr(x, '__iter__'):
return make_tuple(x)
else:
return x

def make_tuple(L):
return tuple(handle_item(x) for x in L)

def foo(*args, **kwargs):
items_cache = make_tuple(sorted(kwargs.items()))
args_cache = make_tuple(args)
if (args_cache, items_cache) not in foo.past_calls:
foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
return foo.past_calls[(args_cache, items_cache)]
foo.past_calls = {}
foo.__name__ = 'memoized_' + fn.__name__
return foo

@memoize
def nimmult(x, y):
if x < y:
return nimmult(y, x)
elif y == 0:
return 0
elif y == 1:
return x
elif x & (x - 1):
return nimmult((x & - x), y) ^ nimmult(x - (x & - x), y)
elif y & (y - 1):
return nimmult(x, (y & - y)) ^ nimmult(x, y - (y & - y))
elif x in [2, 4, 16, 256, 65536, 4294967296]:
return 3 * x // 2 if x == y else x * y
elif x == y:
tempx, p = x, 0
while tempx > 1:
tempx >>= 1
p += 1
p = p & -p
p = 1 << p
return nimmult(3 * p // 2, nimmult(x // p, y // p))
else:
tempx, tempy, px, py = x, y, 0, 0
while tempx > 1:
tempx >>= 1
px += 1
while tempy > 1:
tempy >>= 1
py += 1
temp = px & py
temp = 1 << temp
return nimmult(x // temp * y // temp, nimmult(temp, temp))

I've included the memoize operator that I usually use in the code. In summary, Yes.. It's as simple as that :)

Until then
Yours Aye
Me

## Saturday, September 21, 2019

### A simple problem in Geometric probability

I recently came across a Stackexchange post that asks for the probability that a point chosen randomly inside an equilateral triangle is closer to the center than to any of its edges. The square case was Problem B1 in 50th Putnam 1989.

I thought what about a general n-gon. Obviously for a circle, the answer should be 1/4. To tackle, the general case, we can consider the center of the polygon to be the origin and orient the polygon in such a way that one of its edges is perpendicular to the positive $y$-axis.

We can scale the polygon such that this edge passes through $(0,1)$. In other words, we scale the polygon such that the inradius is $1$. With the polygon scaled to a unit-inradius, the side of the polygon becomes $2\tan(\pi/n)$.

Now, we know that the locus of the points that are equidistant from the origin and to the line $y=1$ is a parabola. Using the formulae for distance between two points and distance between a point and a line, we have,

$x^2+y^2=\displaystyle\frac{(y-1)^2}{1^2+0^2}$

The above formulae is the same one given in this Wikipedia page. Simplifying the above we find the equation of the parabola as $y=(1-x^2)/2$.

Using the symmetry of the $n$-gon, we only focus only on the triagulation of the polygon formed by the origin and the edge that lies along the line $y=1$. The area of one such triangulation can be easily seen as $\tan(\frac{\pi}{n})$.

The lines through the origin bounding this edge are $y=\pm x \tan(\pi/2-\pi/n)=\pm x \cot(\frac{\pi}{n})$. From now on, we restrict ourselves to the positive sign again using symmetry to our advantage.

The point of intersection of the line $y=\cot(\frac{\pi}{n})x$ and the parabola $y=(1-x^2)/2$. Solving the resulting quadratic we can simplify the positive root as $x=\tan(\frac{\pi}{2n})$. Therefore the point of intersection is $(\tan(\frac{\pi}{2n}),\tan(\frac{\pi}{2n})\cot(\frac{\pi}{n}))$.

To simplify notation, let $t=\tan(\frac{\pi}{2n})$ from now on. Therefore, the line becomes $y=\pm \frac{1-t^2}{2t}x$, the point of intersection becomes $(t,\frac{1-t^2}{2})$ and the area of the traingulation becomes $\frac{2t}{1-t^2}$.

Now, the area bounded by the parabola and the $x$-axis, is given by,

$A=\displaystyle\int_0^t \frac{1-x^2}{2}\,dx=\frac{t}{6}(3-t^2)$

and

the area of the triangle formed by origin, the point $x=t$ and the point of intersection of the parabola and the line is $T=\frac{1}{2}\cdot t \cdot \frac{1-t^2}{2}$.

In each of the triangulations of the $n$-gon, the radom point will be closer to the center than to the edge if it falls within the region bounded by the lines $y=\pm \frac{1-t^2}{2t}x$ and the parabola. If we denote the probability of such an event by $p_n$, then

$p_n=\displaystyle\frac{2(A-T)}{\tan(\frac{\pi}{n})}=\frac{(t^2+3)(1-t^2)}{12}$

Substituting for $t$ and simplifying, we have $p_n=\frac{1}{12}(4-\sec^4(\frac{\pi}{2n}))$

Until then
Yours Aye
Me

## Saturday, August 17, 2019

### Estimating a sum with Probability

Let's say we want to find the the sum
$S=\displaystyle\sum_{k=1}^nf(k)$
but we don't have a closed form or any easier way to calculate this. One way of doing this is to use the idea of approximating an integral.

We can take choose some $m$ random values $0=X_0,X_1,X_2,\cdots,X_m$ between $1$ and $n$ such that $0=X_0<X_1<X_2<\cdots,X_m\leq n$ and use the following expression to get an approximation.

$S_r=\displaystyle\sum_{i=1}^m f(X_i)(X_i-X_{i-1})$

But how good is this approximation? If we define the error as $\Delta=S-S_r$, what is $\mathbb{E}(\Delta)$? In this post, we are precisely trying to answer that.

Let's first calculate $\mathbb{E}(S_r)$. Using linearity of expectation, we can find the expected value of each term in the expression for $S_r$.

$\mathbb{E}(f(X_i)(X_i-X_{i-1}))=\displaystyle\frac{1}{\binom{n}{m}}\sum_{x_1=1}^{x_2-1}\sum_{x_2=1}^{x_3-1}\cdots\sum_{x_{i-1}=1}^{x_i-1}\sum_{x_i=1}^n\sum_{x_{i+1}=x_i+1}^n\cdots\sum_{x_{m-1}=x_{m-2}+1}^n\sum_{x_{m}=x_{m-1}+1}^nf(x_i)(x_i-x_{i-1})$

To simplify this sum, we need the following beautiful result.

$\displaystyle\sum_{x_1=1}^{x_2-1}\sum_{x_2=1}^{x_3-1}\cdots\sum_{x_k=1}^{x_{k+1}-1}1=\frac{(x_{k+1}-1)_{(k)}}{k!}$ and $\displaystyle\sum_{x_2=x_1+1}^n\sum_{x_3=x_2+1}^n\cdots\sum_{x_{k+1}=x_k+1}^n1=\frac{(n-x_1)_{(k)}}{k!}$

Note that how they mimic repeated integration of the constant function. Using the above results, we first simplify the expectation to

\begin{align} \mathbb{E}(f(X_i)(X_i-X_{i-1}))&=\displaystyle\frac{1}{\binom{n}{m}}\sum_{x_{i-1}=1}^{x_i-1}\sum_{x_i=1}^nf(x_i)(x_i-x_{i-1})\frac{(x_{i-1}-1)_{(i-2)}}{(i-2)!}\frac{(n-x_i)_{(m-i)}}{(m-i)!}\\ &=\frac{1}{\binom{n}{m}}\sum_{x_i=1}^nf(x_i)\frac{(x_i)_i}{i!}\frac{(n-x_i)_{(m-i)}}{(m-i)!}\\ &=\frac{1}{\binom{n}{m}}\sum_{k=1}^nf(k)\frac{(k)_i}{i!}\frac{(n-k)_{(m-i)}}{(m-i)!}\\ &=\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)\binom{m}{i}(k)_i(n-k)_{(m-i)}\\ \end{align}

Therefore,
\begin{align} \mathbb{E}(S_r)&=\displaystyle\frac{1}{(n)_{(m)}}\sum_{i=1}^m\sum_{k=1}^nf(k)\binom{m}{i}(k)_i(n-k)_{(m)}\\ &=\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)\bigl((n)_{(m)}-(n-k)_{(m)}\bigl)\\ &=S-\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)(n-k)_{(m)}\\ \end{align}

where we have the used the falling factorial analogue of the binomial theorem in the second equality. Now that we can have the result for the expected value of the error as,

\begin{align} \mathbb{E}(\Delta)&=\displaystyle\frac{1}{(n)_{(m)}}\sum_{k=1}^{n-m}f(k)(n-k)_{(m)}=\frac{1}{\binom{n}{m}}\sum_{k=1}^{n-m}f(k)\binom{n-k}{m} \end{align}

That's pretty nice, isn't it?

I find this random sums extremely good in certain cases. Especially, I was surprised when I tried to approximate the sum of totients and the divisor function. Considering that these number theoretic functions are themselves apparently random, the expected relative errors are surprising small. You should try them out for yourselves.

Until then
Yours Aye
Me

## Thursday, June 6, 2019

### (A family of) Integrals independent of a parameter

I recently came across a very nice wordpress page. There are a plenty of nice math in that page that I thoroughly enjoyed, especially the Distances to a line from the vertices of a regular polygon. In this post, we look at one of the results based on Integrals not being dependent on a parameter from the same wordpress page.

The author shows that

$\displaystyle \int_0^\infty f(x^2)\,dx=\int_0^\infty f((x-a/x)^2)\,dx$,    $\text{Re}(a)\leq0$

This result can be further generalized. In fact, the presence of the square term in the facilitates the proof of the result via differential under the integral sign (sadly, only via that route), but that doesn't mean we can't take a different route to the general result. Taking a cue from Problem B4 of 29th Putnam 1968, we show that,

$\displaystyle \int_{-\infty}^\infty f(x)\,dx=\int_{-\infty}^\infty f(x-a/x)\,dx$ provided one of them exists.

For example, here are the WolframAlpha results for $\sin x / x$ and $\sin(x-1/x)/(x-1/x)$. It's surprising that while Mathematica calculates the first case exactly as $\pi$ in the blink of an eye, it fails to evaluate the second.

Let $I=\displaystyle \int_{-\infty}^\infty f(x)\,dx$

Substitute $x=y-a/y$. Now comes the cute part. Note that when $y=0$, $x=-\infty$ and when $y=\infty$, $x=\infty$. Also, $dx=(1+\frac{a}{y^2})dy$

Therefore,

\begin{align} I&=\displaystyle \int_0^\infty f(y-a/y)\left(1+\frac{a}{y^2}\right)\,dy\\ &=\displaystyle \int_0^\infty f(y-a/y)\,dy+\displaystyle \int_0^\infty f(y-a/y)\frac{a}{y^2}\,dy\\ \end{align}

Now, we make the substitution, $t=-\frac{a}{y}$ in the second integral. Therefore, $dt=\frac{a}{y^2}dy$. Also, $y=0,t=-\infty$ and $y=\infty,t=0$. Hence,

\begin{align} I&=\displaystyle \int_0^\infty f(y-a/y)\,dy+\displaystyle \int_{-\infty}^0 f(-a/t+t)\,dt\\ &=\displaystyle \int_{-\infty}^\infty f(x-a/x)\,dx \end{align}

which proves the result.

In any case, just like the author of the wordpress page, I too was surprised by the fact that the transformation $x \to x-a/x$ changes the shape of the curve drastically keeping the area constant. A non-calculus based proof of this result would be a feast to the senses I guess but I couldn't find it yet. See ya again with a nice post.

Until then
Yours Aye
Me

## Saturday, October 27, 2018

### German Tank problem

I was recently watching How to estimate a population using statisticians in Youtube and this reminded of the German tank problem. The problem has a very nice history and I started reading about it in Wikipedia after I watched the video.

I found it very interesting that the problem can be achieved using either frequentist inference or Bayesian inference, achieving different results. Estimating the size of a Population by gives the proof of the famous result that the number of tanks.

Let the random variables $N$ denote the unknown number of tanks, $M$ be the maximum serial number observed and $K$ be the sample size. Then given $M=m$ and $K=k$,

$\displaystyle N \approx \frac{k+1}{k}m-1$ by frequentist analysis and
$\displaystyle \mathbb{E}(N|M=m,K=k)=(m-1)\frac{k-1}{k-2}$ by Bayesian analysis.

Both of these estimates are based on the probability $M=m$ is the maximal among $N$ variables with a sample size of $K$. That is,
$\displaystyle \mathbb{P}(M=m|N,K)=\frac{\binom{m-1}{K-1}}{\binom{N}{K}}$

Now let's think differently. The history of the German tank problem meant that the serial number of the tanks were noted after they were captured or destroyed. But what if they weren't? Assume that the serial number were noted without the tanks being captured/destroyed. In other words, there is a possibility of the observed numbers being repeated. Therefore,

$\displaystyle \mathbb{P}(M=m|N,K)=\frac{\binom{K+m-1}{m-1}-\binom{K+m-2}{m-2}}{\binom{K+N-1}{N-1}}=\frac{K}{m-1}\frac{\binom{K+m-2}{m-2}}{\binom{K+N-1}{N-1}}$

Surprisingly, even in this case the total likelihood function remains the same as in the original German tank problem. That is,

$\displaystyle \sum_{n}\mathcal{L}(n)=\frac{k}{k-1}$ using the sum $\displaystyle \sum_{n=m}^\infty \frac{1}{\binom{k+n-1}{n-1}}=\frac{k}{m}\frac{1}{\binom{k+m-2}{k-2}}$

Proceeding like in the wiki article, this we have,

$\displaystyle N \approx \frac{k+1}{k}m-\frac{1}{m}$ by frequentist analysis and
$\displaystyle \mathbb{E}(N|M=m,K=k)=m\frac{k-1}{k-2}$ by Bayesian analysis.

Now coming back to the video, I quoted at the start of this post, it is clear that the population estimation is done with the famous capture-recapture method. This is basically the setting usually described in terms of Hypergeometric distribution. The estimate of the total population in this case is given by,

$\displaystyle N \approx \frac{nK}{k}$

where $K$ is the initial sample size, $n$ is the second sample size and $k$ is the number of elements that are recaptured.

But this is based on frequentist inference. What about the Bayseian inference? Turns out we would need the following sum to come up with the total likelihood function.

$\displaystyle \sum_{N \geq \text{max}(n,K)}\frac{\binom{N-K}{n-K}}{\binom{N-m}{n-m}}=\frac{n-m}{K-m-1}\frac{1}{\binom{K-m-2}{k-m-2}}$

Using we have, $\displaystyle \sum_{n}\mathcal{L}(n)=\frac{nK}{k(k-1)}$ and therefore $\displaystyle \mathbb{E}(N)=\frac{(n-1)(K-1)}{k-2}$ as given here.

Now, the final part. Matt does something interesting in the video. He uses the mark-and-recapture method to estimate the number of candies in a bowl by repeated sampling. How do we go about that?

Let $n_0$ be the total number of candies in the bowl that we are trying to estimate. To this end, we capture a bunch of them each time and mark how many times those candies were captured. Let's say we do this $k$ times and $n_1,n_2,\cdots,n_k$ be sample sizes. Let $X_j$ denote the number of candies that were captured $j$ times. Now we have,

$\displaystyle \mathbb{E}(X_j)=n_0[t^j]\prod_{i=1}^k \left(1-\frac{n_i}{n_0}+ t\frac{n_i}{n_0}t\right)$

Using the data given by Matt in the video, we observe that $2$ candies were capture $4$ times. Using this as an estimate for $\mathbb{E}(X_4)$ and solving resulting equation for $n_0$ and picking the maximal root, we have $n_0 \approx 164.3$ whereas the bowl originally had about $169$ candies.

The above formulae is based on frequentist inference. It's not so hard to arrive at the $\mathbb{P}(X_j=m)$ with recurrence but finding a closed form, which I think will be needed in Bayesian inference, remains a challenge. I'll update if I'm successful in that front.

Until then,
Yours Aye
Me

## Monday, April 30, 2018

### The Birthday Problem

The Birthday problem is one of the classic problems in Probability theory concerning the probability that, in a set of $n$ randomly chosen people, some pair will have a the same birthday. The idea in this post is to generalize this setup and find the expected value in this generalization.

The generalization here is this: Assume that there is a very long line of persons ready to enter a very large room one by one. Each person is let in and declares her birthday upon entering the room. How many people, on average, must enter in order to find $k$ people that have birthdays that are $d$ days apart (given an $r$-day year)?

Non-Spoiler: Some readers may recognize that this Problem 584 of Project Euler. But note that this post neither gives the answer nor the right way to solve that problem. All we are going here is find an approximation of the required value which is far off from the correct answer.

Well, when we say 'approximation', we are immediately reminded of the Poisson approximation of the Birthday problem in Wikipedia. But the idea here is that most of the approximations I found in the web concerns only about the probability and not about the expected value.  A special case of this problem is solved in Strategic Practice 5 of the (amazing) Stat 110 course by the (extraordinary) Prof. Joseph Blitzstein. And I got curious as to how good the approximation really is.

Let $\mathbb{E}(k,d,r)$ be the expected number of people that have to enter room such that we have $k$ people with birthdays $d$ days apart for the first time.

Let $X_n$ be the number of $k$-set birthday matches in when $n$ people have entered the room. By the Poisson paradigm, $X_n$ can be considered to be a poisson variable. The parameter $\lambda_n$ for this variable is given by,

$\displaystyle\lambda_n=\displaystyle\binom{n}{k}\cdot\frac{1}{(k-1)!}\left(\frac{2d+1}{r}\right)^{k-1}$  (why?)

Therefore we immediately have, $\mathbb{P}(X_n=0)=e^{-\lambda_n}$

Now by a simple trick common in evaluating expected values, we have,

\begin{align} \displaystyle\mathbb{E}(k,d,r)&=k+\sum_{n=k}^\infty\mathbb{P}(X_n=0)=k+\displaystyle\sum_{n=k}^\infty e^{-\lambda_n} \end{align}

Though we have an infinite sum, the numbers decrease rapidly because of the exponential factor. For the values given the said Project euler problem, this approximation gives $\mathbb{E}_{\text{Poisson}}(3,1,10)\approx 6.15092$ and $\mathbb{E}_{\text{Poisson}}(3,7,100)\approx 8.79751$

But we can take it a little further. When the parameter of a Poisson variable get very large, then the Normal variable with the same mean and variance as the Poisson variable gives better accuracy. Therefore,

\begin{align} \displaystyle\mathbb{E}(k,d,r)&=k+\sum_{n=k}^\infty\mathbb{P}(X_n=0)\\ &=k+\displaystyle\sum_{n=k}^\infty \mathbb{P}(-1/2 \leq X_n \leq 1/2) \end{align}

where $X_n \sim \mathcal{N}(\lambda_n,\lambda_n)$ along with the continuity correction.

Here again, the numbers decrease fast and the sum converges in about fifty terms. Now we have $\mathbb{E}_{\text{Normal}}(3,1,10)\approx 5.68554$ and $\mathbb{E}_{\text{Normal}}(3,7,100)\approx 8.04791$

In fact for the required answer, the normal approximation gives $\mathbb{E}_{\text{Normal}}(4,7,365)\approx 33.6396$ with a relative error of just about $2.5\%$. Not bad.

These values were calculated with the following Mathematica code.

Clear["Global`*"];
PoissonApprox[k_, d_, r_] :=
k + Sum[N[
Exp[-Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/
Factorial[k - 1]]], {n, k, 100}];
NormalApprox[k_, d_, r_] :=
k + Sum[N[
CDF[NormalDistribution[
Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/Factorial[k - 1],
Sqrt[Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/
Factorial[k - 1]]], 1/2] -
CDF[NormalDistribution[
Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/Factorial[k - 1],
Sqrt[Binomial[n, k] Power[(1 + 2 d)/r, k - 1]/
Factorial[k - 1]]], -1/2]], {n, k, 100}]
PoissonApprox[4, 7, 365]
NormalApprox[4, 7, 365]

Hope you enjoyed this. See you again with a nice problem.

Until then
Yours Aye
Me