Wednesday, March 18, 2020

A note on Berlekamp Massey Algorithm


Let's say we are given a sequence of $2n$ terms and were told that these are generated by a linear recurrence of order $n$ with the first $n$ terms of the sequence being the initial terms. More specifically, given $n$ terms, we want a recurrence that generates the last $\lfloor \frac{n}{2}\rfloor$ terms with the first $\lceil \frac{n}{2} \rceil$ terms.

The problem of finding the recurrence given the terms can be solved by Berlekamp Massey (BM) Algorithm. Though there are not many sources that I could find for solving recurrences with BM, I found the CodeForces entry that explains much of the details. Yet another source that I found very interesting is Berlekamp Massey algorithm.

To make the algorithm more clear than the quoted page, let's take an example. Let the sequence of terms be $[1,2,4,10,24,50,124]$.

Looking at (only) the first three terms, we make a first guess that

$\text{Guess 1:}\quad a_n=2a_{n-1}$ (or) $a_n-2a_{n-1}=0$

But when we include the fourth term, we run into a problem. We have a discrepancy.

$\Delta_4=\text{Actual}-a_4=10-2a_3=10-8=2$

We stare into the (only) first four terms of the sequence and come with another guess.

$\text{Guess 2:}\quad a_n=2a_{n-1}+a_{n-2}$ (or) $a_n-2a_{n-1}-2a_{n-2}=0$

And this correctly generates the fourth term.

(two tiny details at this point for the curious minded - starts here...)
First, someone could've guessed $a_n=2a_{n-1}+2a_{n-3}$. That works too. Continuing with the BM algorithm, we may end up with different answers but both recurrences will correctly generate the last three terms using the first four terms. Try it!

Second, though our second guess correctly predicts $a_4$, it fails for $a_3$. This is not a problem as Flawed guesses, especially for the first terms in the first half, gets 'course-corrected' at later stage of the algorithm. In the final algorithm, we only be making guesses, that too trivial guesses, at the initial stage.
(... and ends here)

We try it for the next term and we see the recurrence correctly generates fifth term. We're good. We try it for the next term and we run into problem. We have a discrepancy of $8(=58-50)$.

But this time we don't have to guess anything more. Note that we re-write our 'Guess 2' as

$\text{Guess 2:}\quad a_n=2a_{n-1}+a_{n-2}+k(a_{n-2}-2_{n-3})$

for any real $k$.

Where did the terms in the bracket come from? It our first guess shifted by two terms.

Why have we shifted it by two terms? Because we are currently finding a discrepancy at the sixth term and our last guess created a discrepancy at fourth term.

Will this not affect the previous terms? That is the key of BM algorithm. The terms in the bracket will become zero when we evaluate it for earlier terms.

For example, when we evaluate Guess 2 for $a_5$, the terms in the bracket will be $a_3-2a_2$ which is actually zero. In fact, only because it was zero, we made it our first guess.

Now that we have an additional factor of $k$ in our 'Guess 2' now, we can use it to our advantage.  Choosing $k=-4$, we remove the discrepancy at the sixth position.

From now on, we can continue using the same idea. Anytime we face a discrepancy, we can choose an right multiple (based on the current and previous deviation) of our previous guess (shifted by the right amount) to correct that discrepancy. And this continues for all elements in the given sequence, thus completing the algorithm.

This type of problem comes every now and then in Competitive programming. For a given problem, we can quickly code a brute force method that can churn out the first few terms and we would be interested in finding a linear recurrence if there is one.

If we have the recurrence, finding the $n$th term (even for $n$ of order $10^{10^7}$) of the recurrence can be found in a matter of seconds. For example, we have this wonderful Codechef Editorial that explains how to do that.

Here is an (over)commented Python code of the BM algorithm based on what I wrote above.

from timeit import default_timer

start = default_timer()

def berlmassey(l0, p0):
     # Input: list of length 2n and a prime p
     # Output: Recurrence of length n that takes the first n terms as initial values and generates the next n (and possibly the further) terms
     l, p = l0[:], p0
     if len(l) <= 1:
          return l
     if l[0] != 1:          # WLOG, first element of the list should be 1
          return []
     old_rec = []          # there is no recurrence to start with
     v = 1          # deviation of the (empty) recurrence that occurs in the first element
     old_pos = 0          # trivially the list was fine until the first element
     curr_rec = [l[1] % p]          # we guess something that's trivially true
     curr_pos = 1          # starting with the second element of the list
     while curr_pos < len(l):
          temp = 0
          for i in range(len(curr_rec)):          # check what the current recurrence gives for the next element
               temp += curr_rec[i] * l[curr_pos - i - 1]
               temp %= p
          temp = (l[curr_pos] - temp) % p          # is there a deviation?
          if temp == 0:          # If no deviation, current recurrence is still good and hence moving on..
               curr_pos += 1
               continue
          temp_curr_rec = curr_rec[:]          # just a temporary storage
          v = temp * pow(v, -1, p) % p          # normalizing factor based on current and last deviation
          old_rec = [p - x for x in old_rec]          # re-strucuturing the last successful recurrence
          old_rec = [1] + old_rec
          old_rec = [0] * (curr_pos - old_pos - 1) + old_rec
          if len(old_rec) > len(curr_rec):          # making last successful and current recurrences the same length
               curr_rec = curr_rec + [0] * (len(old_rec) - len(curr_rec))
          else:
               old_rec = old_rec + [0] * (len(curr_rec) - len(old_rec))
          curr_rec = [(curr_rec[i] + v * old_rec[i]) % p for i in range(len(curr_rec))]          # updating the current recurrence with the normalizing constant and the (old) last successful recurrence
          old_rec = temp_curr_rec[:]          # the recurrence we started with becomes the last successful recurrence
          v = temp          # the deviation that occured at this position
          old_pos = curr_pos          # this position becomes the last deviated position
          curr_pos += 1          # On to the next iteration..
     return curr_rec


print(berlmassey([1, 2, 4, 10, 24, 50, 124], 1000000007))
print(default_timer() - start)

Finally, a couple of points regarding the code. As there are divisions involved, I have used a prime and made everything with modulo operations. We can also use exact divisions if there is a need. Hope you enjoyed it.


Until then
Yours Aye
Me

Tuesday, March 3, 2020

A Binary Grid Counting Problem - Part I


This post is inspired by the A collection of grid counting problems. The entire wordpress page is a great read and I thoroughly enjoy it every time I visit.

In Example 5 of the quoted page, the author discusses the number of ways of coloring a $n \times n$ grid in which every row and every column has exactly two black squares. In the example that's immediately next, he discusses a counting problem that uses one of favorite results in combinatorics - the Burnside's Lemma.

That got me thinking about the number of unique (upto rotations and reflections) colorings of a $2n \times 2n$ grid with exactly two black squares in each row and column.

We'll also be using Burnside's lemma for this task. We'll be considering the eight symmetries of the square and count how many colorings are fixed by each of these symmetries. We'll progressing on the increasing order of difficulty. Let's begin.

Identity - Let $t'_n$ denote total number of such colorings on an $n \times n$ grid. This is essentially what the wordpress page was doing. In fact we have A001499 that gives us a recurrence. With $t'_0=1$, we have

$t'_n=n(n-1)t'_{n-1}+\frac{1}{2}n(n-1)^2t'_{n-2}$

As we are only interested in $2n \times 2n$ grid, let's define $t_n=t'_{2n}$. With $t_0=1$, this follows the recurrence,

$\displaystyle t_n=\frac{n(2n-1)^2}{2n-3}(8n^2-16n+7)t_{n-1}-2n(n-1)^2(2n-1)^2(2n-3)t_{n-2}$

(or)

$\displaystyle \frac{t_n}{2n-1}=n(2n-1)(8n^2-16n+7)\frac{t_{n-1}}{2n-3}-2n(n-1)^2(2n-1)(2n-3)(2n-5)\frac{t_{n-2}}{2n-5}$

Diagonal (Anti-Diagonal) symmetry - Let $d'_n$ denote the required colorings that are symmetric along the main diagonal. Again, we have A000986 that gives us the following recurrence. Starting with $d'_0=1$,

$d'_n=2(n-1)d'_{n-1}-(n-1)(n-3)d'_{n-2}-\frac{1}{2}(n-1)(n-2)(n-3)d'_{n-4}$

Again, let's define $d_n=d'_{2n}$ which is the number of required colorings with diagonal (anti-diagonal) symmetry in a $2n \times 2n$ grid. With $d_0=1$,

$\displaystyle d_n=\frac{2n-1}{2n-3}(8n^2-16n+7)d_{n-1}-\frac{(n-1)(2n-1)}{2n-5}(16n^3-104n^2+230n-173)d_{n-2}-2(n-1)(n-2)(2n-1)(8n^2-38n+41)d_{n-3}-2(n-1)(n-2)(n-3)(2n-1)(2n-3)(2n-7)d_{n-4}$

(or)

$\displaystyle \frac{d_n}{2n-1}=(8n^2-16n+7)\frac{d_{n-1}}{2n-3}-(n-1)(16n^3-104n^2+230n-173)\frac{d_{n-2}}{2n-5}-2(n-1)(n-2)(2n-7)(8n^2-38n+41)\frac{d_{n-3}}{2n-7}-2(n-1)(n-2)(n-3)(2n-3)(2n-7)(2n-9)\frac{d_{n-4}}{2n-9}$

Horizontal (Vertical) reflection - Let $r_n$ denote the number of colorings that are fixed by Horizontal reflection on a $2n \times 2n$ grid. As we need the grid to be the same after reflecting it horizontally (vertically), we only have to fill up the top $n \times 2n$ grid. We have to make sure that each row has exactly two black squares and each column has exactly one black square in the $n \times 2n$ grid.

Proceeding in the same way as in the wordpress page, we can see that,
$\begin{align}
r_n &= [x_1^2x_2^2\cdots x_n^2y_1y_2\cdots y_{2n}]\prod_{i=1}^n\prod_{j=1}^{2n}(1+x_iy_j)\\
&= [x_1^2x_2^2\cdots x_n^2y_1y_2\cdots y_{2n}]\prod_{j=1}^{2n}(1+(x_1+x_2+\cdots+x_n)y_j+O(y_j^2))\\
&= [x_1^2x_2^2\cdots x_n^2](x_1+x_2+\cdots+x_n)^{2n}=2^{-n}(2n)!\\
\end{align}$

So, we have the recurrence $r_n=n(2n-1)r_{n-1}$ with $r_0=1$.

180 deg. rotation symmetry - Let $f_n$ denote the number of colorings of a $2n \times 2n$ grid that are fixed by rotating the grid by 180 degrees. Again like the previous case, we only have to fill up the top $n \times 2n$ grid. We still make sure that each row has exactly two black squares but unlike the previous case, we need to ensure that the total number of black squares in the $j$-th column and $(2n+1-j)$-th column is two.
$\begin{align}
f_n&=[x_1^2x_2^2\cdots x_n^2y_1y_2\cdots y_{2n}]\prod_{i=1}^n\prod_{j=1}^{2n}(1+x_iy_j)\\
&=[x_1^2x_2^2\cdots x_n^2t_1^2t_2^2\cdots t_n^2]\prod_{i=1}^n\prod_{j=1}^n(1+x_it_j)^2\\
&=[x_1^2x_2^2\cdots x_n^2t_1^2t_2^2\cdots t_n^2]\prod_{j=1}^n(1+(x_1+x_2+\cdots +x_n)t_j+(x_1x_2+x_2x_3+\cdots+x_{n-1}x_n)t_j^2+O(t_j^3))^2\\
&=\displaystyle[x_1^2x_2^2\cdots x_n^2]\Biggl(\biggl(\sum_{i=1}^n x_i\biggl)^2+2\sum_{1\leq i<j\leq n} x_ix_j\Biggl)^n\\
&=\displaystyle[x_1^2x_2^2\cdots x_n^2]\Biggl(2\biggl(\sum_{i=1}^n x_i\biggl)^2-\sum_{i=1}^n x_i^2\Biggl)^n\\
&=\displaystyle[x_1^2x_2^2\cdots x_n^2]\sum_{k=0}^n(-1)^k\binom{n}{k}\biggl(\sum_{i=1}^n x_i^2\biggl)^k 2^{n-k}\biggl(\sum_{i=1}^n x_i\biggl)^{2(n-k)}\\
&=\displaystyle\sum_{k=0}^n(-1)^k\binom{n}{k}\frac{n!}{(n-k)!} 2^{n-k}\frac{(2n-2k)!}{2^{n-k}}\\
&=\displaystyle\sum_{k=0}^n(-1)^k \binom{n}{k}^2k!(2n-2k)! \\
\end{align}$

where we have used $t_i=y_i=y_{2n+1-i}$ in the second equality.

Even though we have a closed form, I still wanted to find the recurrence relation. Taking a cue from A001499, I found the following generating function for $f_n$.

$\displaystyle \sum_{n=0}^\infty f_n \frac{z^n}{n!^2}=\frac{e^{-z}}{\sqrt{1-4z}}$

This helped me in finding A296618 which has a recurrence relation for the EGF coefficients of the above function which are actually $\frac{f_n}{n!}$. From there it was straightforward to see that with $f_0=1$,

$f_n=n(4n-3)f_{n-1}+4n(n-1)^2f_{n-2}$

90 deg. Clockwise (Counter-Clockwise) symmetry - Let $c_n$ denote the number of colorings of a $2n \times 2n$ grid that are fixed by rotating the grid by 90 degrees (270 degrees). Now we only have to fill the first quadrant of the grid. This time we have,

$c_n=\displaystyle[x_1^2x_2^2\cdots x_{2n}^2 y_1^2y_2^2\cdots y_{2n}^2]\prod_{i=1}^n\prod_{j=1}^n(1+x_iy_j\cdot x_jy_{2n+1-i}\cdot x_{2n+1-i}y_{2n+1-j}\cdot x_{2n+1-j}y_i)$

Now let $t_i=x_iy_ix_{2n+1-i}y_{2n+1-i}$. Then,

$c_n=\displaystyle[t_1^2t_2^2\cdots t_n^2]\prod_{i=1}^n\prod_{j=1}^n(1+t_it_j)$

I was not able to simplify this in anyway and I tried a bunch of approaches. Finally, I gave up and found the sequence with Mathematica. The values for $n=0,1,2,\cdots$ are $1, 1,2,12,90,810,8940,\cdots$. Unfortunately, this time we don't have a OEIS sequence to directly get a recurrence. After laboring for a day, I found the following recurrence with a lot of paper work. With $c_0=1$,

$c_n=(2n-1)c_{n-1}-(n-1)c_{n-2}+2(n-1)(n-2)c_{n-3}$

Searching for a generating function, we get,

$\displaystyle\sum_{n=0}^\infty c_n\frac{z^n}{n!}=\frac{e^{-z^2/2}}{\sqrt{1-2z}}$

From this we get the following explicit formula.

$\displaystyle c_n=\frac{n!}{2^n}\sum_{k=0}^n \frac{(-2)^k}{k!}\binom{2n-4k}{n-2k}$

With ideas from Algorithms forHolonomic FunctionsAlgorithmic Manipulations and Transformations of Univariate Holonomic Functions and Sequences and Computing the Generating Function of a series given its First few terms, it seems we can find the above recurrence from the generating function.

Back to the main question of this post, we finally have everything we need to apply Burnside's lemma. Let $A_n$ denote the number of unique (upto rotations and reflections) colorings of a $2n \times 2n$ grid with exactly two black squares in each row and column. Then,

$A_n=\displaystyle\frac{1}{8}(t'_{2n}+2d'_{2n}+2r_n+f_n+2c_n)=\frac{1}{8}(t_n+2d_n+2r_n+f_n+2c_n)$

On a side note, the number of unique (upto rotations and reflections) colorings of a $n \times n$ grid with exactly one black square in each row and column can be found completely using OEIS. The only thing we have to note is that there can be no colorings for vertical or horizontal reflections. If $B_n$ is the required result, then

$\displaystyle B_n=\frac{1}{8}(4\text{ A263685}(n)+2\text{ A000085}(n))$

Hope you enjoyed this. See ya later with another interesting topic.


Until then
Yours Aye
Me

Friday, January 24, 2020

A nice result in Geometry


Euler's theorem in Geometry is a great result that connects the distance between the circumcentre and incentre of a triangle with the circumradius and inradius of the triangle. A similar result, known as Fuss's theorem, holds for bi-centric quadrilaterals. It was surprising when I found a similar relation holds for three mutually tangent circular arcs.

Consider the three blue circles $A$, $B$ and $C$ (with radii $r_A$, $r_B$ and $r_C$ respectively) that are mutually tangent to each other. Let the points of tangency be $T_A$, $T_B$ and $T_C$ which becomes the vertices of our 'triangle of circular arcs'.

We can now draw two circles: a circumcirle (red) that passes through all the tangency points and an incircle (green) that is mutually internally tangent to all the blue circles. Let's position the configuration in the Argand plane such that center of the incircle is at the origin.

Let the centers of the blue circles be $z_A$, $z_B$ and $z_C$, and the center of the circumcircle be $z$.



Now if we let $R$ and $r$ be the circumradius and inradius of our triangle, then
$$\text{Eqn (1): }d^2=R^2-4Rr+r^2$$
where $d=\vert z \vert$ is the distance between the centers of the circumcircle and incircle.

To prove this, Consider $\triangle z_A z_B z_C$ whose side lengths are $r_A+r_B$, $r_B+r_C$ and $r_C+r_A$. It is easy to show that,

$\text{Eqn (2): }R^2=\displaystyle\frac{r_Ar_Br_C}{r_A+r_B+r_C}$ and $\text{Eqn (3): }r^{-1}={r_A}^{-1}+{r_B}^{-1}+{r_C}^{-1}+2R^{-1}$

$(2)$ is obtained by noting that the red circle is the incircle of $\triangle z_A z_B z_C$ and $(3)$ is a direct application of Descartes' theorem.

Now given the vertices of a complex triangle, we can express the incenter of the triangle using its vertices (using this). Now we have,

$\displaystyle z=\sum_{\text{cyc}}\frac{r_B+r_C}{2(r_A+r_B+r_C)}z_A \implies \text{Eqn (4): }2(r_A+r_B+r_C)z=\sum_{\text{cyc}}(r_B+r_C)z_A$

Multiplying $(4)$ with its complex conjugate, we have,

$\text{Eqn (5): }4(r_A+r_B+r_C)^2\vert z \vert^2=\displaystyle\sum_{\text{cyc}}(r_B+r_C)^2\vert z_A \vert^2+\sum_{\text{cyc}}(r_B+r_C)(r_C+r_A)(z_A\bar{z_B}+\bar{z_A}z_B)$

From the figure, we see that $\vert z_B-z_A \vert=r_A+r_B$. Using the fact that the product of a complex number and its conjugate is the square of its magnitude, we can see that,

$\text{Eqn (6): }z_A\bar{z_B}+\bar{z_A}z_B=\vert z_A \vert^2+\vert z_B \vert^2-\vert z_B-z_A\vert^2=\vert z_A \vert^2+\vert z_B \vert^2-(r_A+r_B)^2$

Using $(6)$ in $(5)$, and cancelling the common terms, we get,

$\text{Eqn (7): }2(r_A+r_B+r_C)\vert z \vert^2=\displaystyle\sum_{\text{cyc}}\vert z_A \vert^2(r_B+r_c)-\prod_{\text{cyc}}(r_A+r_B)$

From the figure above, we have $\vert z_x \vert=r+r_x$ for $x \in \{A,B,C\}$. Using this we get,

$\begin{align}
\displaystyle\text{Eqn (8): }\sum_{\text{cyc}}\vert z_A \vert^2(r_B+r_c)&=\sum_{\text{cyc}}(r^2+2rr_A+r_A^2)(r_B+r_C) \\
&= 2r^2(r_A+r_B+r_C)+4r(r_Ar_B+r_Br_C+r_Cr_A)+\sum_{\text{cyc}}r_A^2(r_B+r_C) \\
\end{align}$

Using $(8)$ in $(7)$, we get,

$\text{Eqn (9): }\displaystyle2(r_A+r_B+r_C)\vert z \vert^2=2r^2(r_A+r_B+r_C)+4r(r_Ar_B+r_Br_C+r_Cr_A)+\sum_{\text{cyc}}r_A^2(r_B+r_C)-\prod_{\text{cyc}}(r_A+r_B)$

Now the second term of $(9)$ can be simplified as
$\begin{align}
\text{Eqn (10): }r(r_Ar_B+r_Br_C+r_Cr_A)&=rr_Ar_Br_C(r_A^{-1}+r_B^{-1}+r_C^{-1})\\
&=rR^2(r_A+r_B+r_C)(r^{-1}-2R^{-1})\\
&=R(R-2r)(r_A+r_B+r_C)
\end{align}$
where we have used $(2)$ and $(3)$ in the second equality.

The last two terms of $(9)$ can be expanded and most of the terms gets cancelled giving,

$\displaystyle\text{Eqn (11): }\sum_{\text{cyc}}r_A^2(r_B+r_C)-\prod_{\text{cyc}}(r_A+r_B)=-2r_Ar_Br_C=-2R^2(r_A+r_B+r_C)$
where we have used $(2)$ in the last equality.

Using $(10)$ and $(11)$ in $(9)$ and dividing both sides by $2(r_A+r_B+r_C)$ , we finally have $(1)$.

From $(1)$, it follows that, $R\geq (2+\sqrt{3})r$ which holds with equality only when $r_A=r_B=r_C$.

This can probably be proved with geometry but I couldn't get my head around it.

Finally, before we finish this discussion, we also note that we can completely parametrize $(1)$ for non-co-prime integer solutions. For integer $a,b$ such that $\gcd(a,b)=1$, $3 \nmid b$ and $b \leq \sqrt{3}a$

(i) If $a\not\equiv b \pmod{2}$, then $R=3a^2+4ab+b^2$, $r=2ab$ and $d=3a^2-b^2$
(ii) If $a\equiv b \pmod{2}$, then $R=(3a^2+4ab+b^2)/2$, $r=ab$ and $d=(3a^2-b^2)/2$

Hope you found this interesting. See ya later.

References:
Tangencies: Three tangent Circles
Efficiently constructing tangent circles
Inversion and applications to Ptolemy and Euler


Until then
Yours Aye
Me

Thursday, December 5, 2019

Bitwise Transforms


Bitwise operators have always fascinated me. There are a number of cool tricks and tips that one can use in programming to shorten the code and even elegant, I would say. To start with, we have Bit twiddling hacks that contains a huge list. I've used some of them in my Nim Multiplication code.

But this code is not about using those tricks. Bitwise operations are interesting from mathematical viewpoint as well. For example, the one that has always stumped me was the Hadamard Transform that are intimately connected with the Bitwise XOR. In one of their wide variety of applications, they play a great role in XOR transforms.

Diving straight into the point, if we have two arrays $\mathbf{a}=(a_0,a_1,a_2,\cdots)$ and $\mathbf{b}=(b_0,b_1,b_2,\cdots)$ and we define the convolution as an array $\mathbf{c}=(c_0,c_1,c_2,\cdots)$ such that,

$c_n=\displaystyle\sum_{i*j=n}a_ib_j$

where '*' is any binary operation. For example, when the binary operation is 'addition', then we have the most commonly used Discrete convolution. If the operation is 'multiplication', we have what we call the Dirichlet convolution.

We can also the Bitwise operations as the binary operation. Most famous of them would be the 'Bitwise XOR' which results in the Dyadic convolution. Similarly, we can define OR convolution and AND convolution using Bitwise-OR and Bitwise-AND as the binary operations respectively.

If evaluated directly, all these convolutions would lead to $O(n^2)$ time complexity. A common technique in making this task easier is defining an easily invertible matrix $\mathbf{M}$ such that

$\mathbf{M}\mathbf{c}=(\mathbf{M}\mathbf{a})\odot(\mathbf{M}\mathbf{b})$

where $\odot$ is the element-wise multiplication. For example, $\mathbf{M}$ is the Vandermonde matrix in case of the Discrete convolution.

For the Dyadic convolution (or the XOR convolution), the matrix we are looking for are the Hadamard matrices, $H$. We can construct the matrices with $H=(1)$ and iteratively constructing the higher order matrices with either of the following two matrices

$H \to
\begin{pmatrix}
H & H \\
H & -H
\end{pmatrix}
$ (or) $H \to
\begin{pmatrix}
H & -H \\
H & H
\end{pmatrix}
$

though the first seems to the most popular choice. Hadamard matrices are very simple in that they are their own inverse which makes inverse part just like the transform. In my opinion, this is very analogous to how XOR can as both addition and subtraction.

The idea in this post is to find the relevant matrices for OR and AND convolutions. Fiddling with the first few terms of the matrix equation we saw before with $\mathbf{c}$ defined as OR/AND accordingly, it is not so hard to come with the following.

$\mathbf{M}_{\text{OR}} \to
\begin{pmatrix}
\mathbf{M}_{\text{OR}} & \mathbf{0} \\
\mathbf{M}_{\text{OR}} & \mathbf{M}_{\text{OR}}
\end{pmatrix}
$ (or) $\mathbf{M}_{\text{OR}} \to
\begin{pmatrix}
\mathbf{M}_{\text{OR}} & \mathbf{M}_{\text{OR}} \\
\mathbf{M}_{\text{OR}} & \mathbf{0}
\end{pmatrix}
$

and

$\mathbf{M}_{\text{AND}} \to
\begin{pmatrix}
\mathbf{M}_{\text{AND}} & \mathbf{M}_{\text{AND}} \\
\mathbf{0} & \mathbf{M}_{\text{AND}}
\end{pmatrix}
$ (or) $\mathbf{M}_{\text{AND}} \to
\begin{pmatrix}
\mathbf{0} & \mathbf{M}_{\text{AND}} \\
\mathbf{M}_{\text{AND}} & \mathbf{M}_{\text{AND}}
\end{pmatrix}
$

We can choose either of the matrices for OR/AND. I prefer the first matrices in both cases mostly because the first versions have determinant 1.

Having done that, we follow up with the inverse matrices, $\mathbf{N}_{\text{OR}}$ and $\mathbf{N}_{\text{AND}}$, which can also be iteratively built.

$\mathbf{N}_{\text{OR}} \to
\begin{pmatrix}
\mathbf{N}_{\text{OR}} & \mathbf{0} \\
-\mathbf{N}_{\text{OR}} & \mathbf{N}_{\text{OR}}
\end{pmatrix}
$ and   $\mathbf{N}_{\text{AND}} \to
\begin{pmatrix}
\mathbf{N}_{\text{AND}} & -\mathbf{N}_{\text{AND}} \\
\mathbf{0} & \mathbf{N}_{\text{AND}}
\end{pmatrix}
$

Now these can be used to answer questions like finding the number of sequences containing numbers from $0$ to $k$ with a given OR-sum (or AND-sum).

I think one of the reason that the Dyadic (XOR) convolution is more famous is because of their application in nim games. Unfortunately, OR and AND convolutions does not have any direct applications that I'm aware of. Still, I found these beautiful and worth sharing.

Just to conclude the story, to solve the LCM-convolution between two functions, we dirichlet-convolve each of them with the constant function. Now we simply do a pointwise multiplication and dirichlet-convolve the result with Mobius function. This is well explained in this stackexchange question and Multiplicative Arithmetic Functions of SeveralVariables: A Survey. These also discuss GCD-convolution but I don't understand them yet.

UPDATE: I always knew that I may not be the first ones to come up with the corresponding matrices for OR and AND convolution but I was a little disappointed when I found Serbanology when I was reading All the good tutorials found for Competitive programming.

Until then
Yours Aye
Me

Monday, October 14, 2019

Nim Multiplication


I was recently working on a problem that need the idea of nim-multiplication. While nim-addition plays a nice role in all 1D combinatorial games, nim-multiplication takes its place in their 2D counterparts. An excellent read on the subject of Impartial Combinatorial Games can be found in Thomas S. Ferguson's Game Theory.

Now, back to nim-multiplication. While nim-addition, denoted by $\oplus$, is easy to calculate in a computer with a simple BitXOR operation, nim-multiplication, denoted by $\otimes$, is not as simple. Every source I found in the Internet listed the following two rules to simplify nim-multiplication.

  • Rule 1: The nim-product of a Fermat power and any smaller number is just their ordinary product.
  • Rule 2: The nim-product of a Fermat power with itself is 3/2-th of the Fermat power.
These two rules, in addition to the fact that nim-product is distributive w.r.t nim-addition and the commutative property of nim-multiplication, certainly simplifies the task.

To further simplify the task, we can make use of the following two additional rules.
  • Rule 3: For any sequence of non-negative integers $p_0<p_1<p_2<\cdots$, we have $2^{2^{p_0}}\otimes2^{2^{p_1}}\otimes2^{2^{p_2}}\otimes\cdots=2^{2^{p_0}}\cdot2^{2^{p_1}}\cdot2^{2^{p_2}}\cdot\cdots$. This also shows that any power of 2 can be written as nim-product of distinct Fermat powers by writing the exponent as sums of powers-of -2.
  • Rule 4:If $x\text{ AND }y=0$, then $x\oplus y=x + y$
  • Rule 5: If $x\text{ AND }y=0$, then $2^x\otimes 2^y=2^x\cdot2^y$
Rule 3 works because we can repeatedly apply Rule 1 for nim-products starting from the left and convert them to ordinary products. Note that any resulting term on the left will always be smaller than the term that comes after.

Rule 4 works because the AND condition makes sure that no bits are identical in the binary representation of $x$ and $y$.

Rule 5 works because the nim-product can be converted to nim-products of Fermat powers. The AND condition makes sure that no two Fermat powers are equal and hence we can sort the Fermat powers and apply Rule 3.

Before solving for the nim-multiplication problem, let's just revisit some properties of the $\text{AND}$ product. It is easy that $\text{AND}$ satisfies the following three properties.

  • $x\text{ AND }x=x$
  • Associative: $x\text{ AND }(y \text{ AND }z)=(x\text{ AND }y)\text{ AND }z$
  • Commutative: $x\text{ AND }y=y\text{ AND }x$

Using these properties, we can show that

  • P1: $(x\text{ AND }y)\text{ AND }(x-(x\text{ AND }y))=0$
  • P2: $(x\text{ AND }y)\text{ AND }(y-(x\text{ AND }y))=0$
  • P3: $ (x-(x\text{ AND }y))\text{ AND }(y-(x\text{ AND }y)) = 0$

Now the nim-product of arbitrary numbers can be split into several nim-products of powers-of-2 because
$x\otimes y=(2^{x_0}\oplus2^{x_1}\oplus2^{x_2}\oplus\cdots)\otimes(2^{y_0}\oplus2^{y_1}\oplus2^{y_2}\oplus\cdots)=\displaystyle\sum_{i,j \geq 0}2^{x_i}\otimes 2^{y_j}$
where the summation symbol should be understood as a nim-sum.

Let's first solve for nim-squares of powers-of-2. That is a nim-product of a power-of-2 with itself.

\begin{align}
2^x\otimes 2^x & = (2^{x-(x\text{ AND }-x)}\cdot 2^{x\text{ AND }-x})\otimes (2^{x-(x\text{ AND }-x)}\cdot 2^{x\text{ AND }-x}) \\
& = (2^{x-(x\text{ AND }-x)}\otimes 2^{x\text{ AND }-x})\otimes (2^{x-(x\text{ AND }-x)}\otimes 2^{x\text{ AND }-x})  \\
& = (2^{x\text{ AND }-x}\otimes 2^{x\text{ AND }-x})\otimes (2^{x-(x\text{ AND }-x)}\otimes 2^{x-(x\text{ AND }-x)}) \\
& = (3/2)2^{x\text{ AND }-x}\otimes(2^{x-(x\text{ AND }-x)}\otimes 2^{x-(x\text{ AND }-x)}) \\
\end{align}
We could convert ordinary product in the second equality into an nim-product because the AND product of the exponents can be clearly seen to be zero by using  $y=-x$ in P1.

Now for the nim-product of two distinct powers-of-2,

\begin{align}
2^x\otimes 2^y & = (2^{x-(x\text{ AND }y)}\cdot 2^{x\text{ AND }y})\otimes (2^{y-(x\text{ AND }y)}\cdot 2^{x\text{ AND }y}) \\
& = (2^{x-(x\text{ AND }y)}\otimes 2^{x\text{ AND }y})\otimes (2^{y-(x\text{ AND }y)}\otimes 2^{x\text{ AND }y})  \\
& = (2^{x-(x\text{ AND }y)}\otimes 2^{y-(x\text{ AND }y)})\otimes (2^{x\text{ AND }y}\otimes 2^{x\text{ AND }y}) \\
& = 2^{x-(x\text{ AND }y)}\cdot 2^{y-(x\text{ AND }y)}\otimes (2^{x\text{ AND }y}\otimes 2^{x\text{ AND }y})\\
\end{align}
In the first equality, we have used both P1 and P2 to convert from ordinary product to nim-product. In the last equality, we have used P3 to do the opposite.

The two results above together form the recursion that can be used to solve the nim-multiplication problem. Together with the facts that the nim-product of any number with 0 is 0 and the nim-product of a number with 1 is the number itself, it's not so hard to code the above for nim-multiplication.

UPDATE: My python implementation of the above algorithm.

def memoize(fn):
    """returns a memoized version of any function that can be called
    with the same list of arguments.
    Usage: foo = memoize(foo)"""

    def handle_item(x):
        if isinstance(x, dict):
            return make_tuple(sorted(x.items()))
        elif hasattr(x, '__iter__'):
            return make_tuple(x)
        else:
            return x

    def make_tuple(L):
        return tuple(handle_item(x) for x in L)

    def foo(*args, **kwargs):
        items_cache = make_tuple(sorted(kwargs.items()))
        args_cache = make_tuple(args)
        if (args_cache, items_cache) not in foo.past_calls:
            foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
        return foo.past_calls[(args_cache, items_cache)]
    foo.past_calls = {}
    foo.__name__ = 'memoized_' + fn.__name__
    return foo

@memoize
def nimmult(x, y):
  if x < y:
    return nimmult(y, x)
  elif y == 0:
    return 0
  elif y == 1:
      return x
  elif x & (x - 1):
      return nimmult((x & - x), y) ^ nimmult(x - (x & - x), y)
  elif y & (y - 1):
      return nimmult(x, (y & - y)) ^ nimmult(x, y - (y & - y))
  elif x in [2, 4, 16, 256, 65536, 4294967296]:
      return 3 * x // 2 if x == y else x * y
  elif x == y:
      tempx, p = x, 0
      while tempx > 1:
          tempx >>= 1
          p += 1
      p = p & -p
      p = 1 << p
      return nimmult(3 * p // 2, nimmult(x // p, y // p))
  else:
      tempx, tempy, px, py = x, y, 0, 0
      while tempx > 1:
          tempx >>= 1
          px += 1
      while tempy > 1:
          tempy >>= 1
          py += 1
      temp = px & py
      temp = 1 << temp
      return nimmult(x // temp * y // temp, nimmult(temp, temp))

I've included the memoize operator that I usually use in the code. In summary, Yes.. It's as simple as that :)


Until then
Yours Aye
Me 

Saturday, September 21, 2019

A simple problem in Geometric probability


I recently came across a Stackexchange post that asks for the probability that a point chosen randomly inside an equilateral triangle is closer to the center than to any of its edges. The square case was Problem B1 in 50th Putnam 1989.

I thought what about a general n-gon. Obviously for a circle, the answer should be 1/4. To tackle, the general case, we can consider the center of the polygon to be the origin and orient the polygon in such a way that one of its edges is perpendicular to the positive $y$-axis.

We can scale the polygon such that this edge passes through $(0,1)$. In other words, we scale the polygon such that the inradius is $1$. With the polygon scaled to a unit-inradius, the side of the polygon becomes $2\tan(\pi/n)$.

Now, we know that the locus of the points that are equidistant from the origin and to the line $y=1$ is a parabola. Using the formulae for distance between two points and distance between a point and a line, we have,

$x^2+y^2=\displaystyle\frac{(y-1)^2}{1^2+0^2}$

The above formulae is the same one given in this Wikipedia page. Simplifying the above we find the equation of the parabola as $y=(1-x^2)/2$.

Using the symmetry of the $n$-gon, we only focus only on the triagulation of the polygon formed by the origin and the edge that lies along the line $y=1$. The area of one such triangulation can be easily seen as $\tan(\frac{\pi}{n})$.

The lines through the origin bounding this edge are $y=\pm x \tan(\pi/2-\pi/n)=\pm x \cot(\frac{\pi}{n}) $. From now on, we restrict ourselves to the positive sign again using symmetry to our advantage.

The point of intersection of the line $y=\cot(\frac{\pi}{n})x$ and the parabola $y=(1-x^2)/2$. Solving the resulting quadratic we can simplify the positive root as $x=\tan(\frac{\pi}{2n})$. Therefore the point of intersection is $(\tan(\frac{\pi}{2n}),\tan(\frac{\pi}{2n})\cot(\frac{\pi}{n}))$.

To simplify notation, let $t=\tan(\frac{\pi}{2n})$ from now on. Therefore, the line becomes $y=\pm \frac{1-t^2}{2t}x$, the point of intersection becomes $(t,\frac{1-t^2}{2})$ and the area of the traingulation becomes $\frac{2t}{1-t^2}$.

Now, the area bounded by the parabola and the $x$-axis, is given by,

$A=\displaystyle\int_0^t \frac{1-x^2}{2}\,dx=\frac{t}{6}(3-t^2)$

and

the area of the triangle formed by origin, the point $x=t$ and the point of intersection of the parabola and the line is $T=\frac{1}{2}\cdot t \cdot \frac{1-t^2}{2}$.

In each of the triangulations of the $n$-gon, the radom point will be closer to the center than to the edge if it falls within the region bounded by the lines $y=\pm \frac{1-t^2}{2t}x$ and the parabola. If we denote the probability of such an event by $p_n$, then

$p_n=\displaystyle\frac{2(A-T)}{\tan(\frac{\pi}{n})}=\frac{(t^2+3)(1-t^2)}{12}$

Substituting for $t$ and simplifying, we have $p_n=\frac{1}{12}(4-\sec^4(\frac{\pi}{2n}))$


Until then
Yours Aye
Me

Saturday, August 17, 2019

Estimating a sum with Probability


Let's say we want to find the the sum
$S=\displaystyle\sum_{k=1}^nf(k)$
but we don't have a closed form or any easier way to calculate this. One way of doing this is to use the idea of approximating an integral.

We can take choose some $m$ random values $0=X_0,X_1,X_2,\cdots,X_m$ between $1$ and $n$ such that $0=X_0<X_1<X_2<\cdots,X_m\leq n$ and use the following expression to get an approximation.

$S_r=\displaystyle\sum_{i=1}^m f(X_i)(X_i-X_{i-1})$

But how good is this approximation? If we define the error as $\Delta=S-S_r$, what is $\mathbb{E}(\Delta)$? In this post, we are precisely trying to answer that.

Let's first calculate $\mathbb{E}(S_r)$. Using linearity of expectation, we can find the expected value of each term in the expression for $S_r$.

$\mathbb{E}(f(X_i)(X_i-X_{i-1}))=\displaystyle\frac{1}{\binom{n}{m}}\sum_{x_1=1}^{x_2-1}\sum_{x_2=1}^{x_3-1}\cdots\sum_{x_{i-1}=1}^{x_i-1}\sum_{x_i=1}^n\sum_{x_{i+1}=x_i+1}^n\cdots\sum_{x_{m-1}=x_{m-2}+1}^n\sum_{x_{m}=x_{m-1}+1}^nf(x_i)(x_i-x_{i-1})$

To simplify this sum, we need the following beautiful result.

$\displaystyle\sum_{x_1=1}^{x_2-1}\sum_{x_2=1}^{x_3-1}\cdots\sum_{x_k=1}^{x_{k+1}-1}1=\frac{(x_{k+1}-1)_{(k)}}{k!}$ and $\displaystyle\sum_{x_2=x_1+1}^n\sum_{x_3=x_2+1}^n\cdots\sum_{x_{k+1}=x_k+1}^n1=\frac{(n-x_1)_{(k)}}{k!}$

Note that how they mimic repeated integration of the constant function. Using the above results, we first simplify the expectation to

$\begin{align}
\mathbb{E}(f(X_i)(X_i-X_{i-1}))&=\displaystyle\frac{1}{\binom{n}{m}}\sum_{x_{i-1}=1}^{x_i-1}\sum_{x_i=1}^nf(x_i)(x_i-x_{i-1})\frac{(x_{i-1}-1)_{(i-2)}}{(i-2)!}\frac{(n-x_i)_{(m-i)}}{(m-i)!}\\
&=\frac{1}{\binom{n}{m}}\sum_{x_i=1}^nf(x_i)\frac{(x_i)_i}{i!}\frac{(n-x_i)_{(m-i)}}{(m-i)!}\\
&=\frac{1}{\binom{n}{m}}\sum_{k=1}^nf(k)\frac{(k)_i}{i!}\frac{(n-k)_{(m-i)}}{(m-i)!}\\
&=\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)\binom{m}{i}(k)_i(n-k)_{(m-i)}\\
\end{align}$

Therefore,
$\begin{align}
\mathbb{E}(S_r)&=\displaystyle\frac{1}{(n)_{(m)}}\sum_{i=1}^m\sum_{k=1}^nf(k)\binom{m}{i}(k)_i(n-k)_{(m)}\\
&=\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)\bigl((n)_{(m)}-(n-k)_{(m)}\bigl)\\
&=S-\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)(n-k)_{(m)}\\
\end{align}$

where we have the used the falling factorial analogue of the binomial theorem in the second equality. Now that we can have the result for the expected value of the error as,

$\begin{align}
\mathbb{E}(\Delta)&=\displaystyle\frac{1}{(n)_{(m)}}\sum_{k=1}^{n-m}f(k)(n-k)_{(m)}=\frac{1}{\binom{n}{m}}\sum_{k=1}^{n-m}f(k)\binom{n-k}{m}
\end{align}$

That's pretty nice, isn't it?

I find this random sums extremely good in certain cases. Especially, I was surprised when I tried to approximate the sum of totients and the divisor function. Considering that these number theoretic functions are themselves apparently random, the expected relative errors are surprising small. You should try them out for yourselves.

UPDATE(13 May 2020): We can also extend this method to find a trapezoidal-rule-like approximations. That is, we choose a random $m$-tuple of positive integers $(X_1,X_2,\cdots, X_m) $ such that $0=X_0<X_1<X_2<\cdots<X_m<X_{m+1}=n+1$ and find an 'approximation' using the modified sum,

$\begin{align}
\displaystyle S^*&=\frac{f(X_1)+f(X_0)}{2}(X_1-X_0)+\frac{f(X_2)+f(X_1)}{2}(X_2-X_1)+\cdots+\frac{f(X_{m+1})+f(X_m)}{2}(X_{m+1}-X_m) \\
&= \displaystyle \sum_{k=0}^m \frac{f(X_{k+1})+f(X_k)}{2}(X_{k+1}-X_k)
\end{align}$

The expected error $\delta=S-S^*$ in this is case is then,

$\mathbb{E}(\delta)=\displaystyle\frac{1}{n_{(m)}}\sum_{k=1}^{n-m}\frac{f(k)+f(n-k+1)}{2}(n-k)_{(m)}-\frac{f(0)+f(n+1)}{2}\left(\frac{n+1}{m+1}\right)$

For example, with $n=10^6$ and $m=10^3$,

  • For the Euler's totient function $\varphi(k)$, assuming $\varphi(0)=0$, $\mathbb{E}(\delta) \approx -0.0629\%$ of $S$
  • For the number of positive divisors function $\sigma_0(k)$, assuming $\sigma_0(0)=0$,  $\mathbb{E}(\delta) \approx 0.0662\%$ of $S$
  • For the sum of positive divisors function $\sigma_1(n)$, assuming $\sigma_1(0)=0$, $\mathbb{E}(\delta) \approx 0.0385\%$ of $S$
  • With $f(k)=1-1/k$, taking $f(0)=0$, $\mathbb{E}(\delta) \approx 0.0495\%$ of $S$


Until then
Yours Aye
Me