Thursday, December 5, 2019

Bitwise Transforms


Bitwise operators have always fascinated me. There are a number of cool tricks and tips that one can use in programming to shorten the code and even elegant, I would say. To start with, we have Bit twiddling hacks that contains a huge list. I've used some of them in my Nim Multiplication code.

But this code is not about using those tricks. Bitwise operations are interesting from mathematical viewpoint as well. For example, the one that has always stumped me was the Hadamard Transform that are intimately connected with the Bitwise XOR. In one of their wide variety of applications, they play a great role in XOR transforms.

Diving straight into the point, if we have two arrays $\mathbf{a}=(a_0,a_1,a_2,\cdots)$ and $\mathbf{b}=(b_0,b_1,b_2,\cdots)$ and we define the convolution as an array $\mathbf{c}=(c_0,c_1,c_2,\cdots)$ such that,

$c_n=\displaystyle\sum_{i*j=n}a_ib_j$

where '*' is any binary operation. For example, when the binary operation is 'addition', then we have the most commonly used Discrete convolution. If the operation is 'multiplication', we have what we call the Dirichlet convolution.

We can also the Bitwise operations as the binary operation. Most famous of them would be the 'Bitwise XOR' which results in the Dyadic convolution. Similarly, we can define OR convolution and AND convolution using Bitwise-OR and Bitwise-AND as the binary operations respectively.

If evaluated directly, all these convolutions would lead to $O(n^2)$ time complexity. A common technique in making this task easier is defining an easily invertible matrix $\mathbf{M}$ such that

$\mathbf{M}\mathbf{c}=(\mathbf{M}\mathbf{a})\odot(\mathbf{M}\mathbf{b})$

where $\odot$ is the element-wise multiplication. For example, $\mathbf{M}$ is the Vandermonde matrix in case of the Discrete convolution.

For the Dyadic convolution (or the XOR convolution), the matrix we are looking for are the Hadamard matrices, $H$. We can construct the matrices with $H=(1)$ and iteratively constructing the higher order matrices with either of the following two matrices

$H \to
\begin{pmatrix}
H & H \\
H & -H
\end{pmatrix}
$ (or) $H \to
\begin{pmatrix}
H & -H \\
H & H
\end{pmatrix}
$

though the first seems to the most popular choice. Hadamard matrices are very simple in that they are their own inverse which makes inverse part just like the transform. In my opinion, this is very analogous to how XOR can as both addition and subtraction.

The idea in this post is to find the relevant matrices for OR and AND convolutions. Fiddling with the first few terms of the matrix equation we saw before with $\mathbf{c}$ defined as OR/AND accordingly, it is not so hard to come with the following.

$\mathbf{M}_{\text{OR}} \to
\begin{pmatrix}
\mathbf{M}_{\text{OR}} & \mathbf{0} \\
\mathbf{M}_{\text{OR}} & \mathbf{M}_{\text{OR}}
\end{pmatrix}
$ (or) $\mathbf{M}_{\text{OR}} \to
\begin{pmatrix}
\mathbf{M}_{\text{OR}} & \mathbf{M}_{\text{OR}} \\
\mathbf{M}_{\text{OR}} & \mathbf{0}
\end{pmatrix}
$

and

$\mathbf{M}_{\text{AND}} \to
\begin{pmatrix}
\mathbf{M}_{\text{AND}} & \mathbf{M}_{\text{AND}} \\
\mathbf{0} & \mathbf{M}_{\text{AND}}
\end{pmatrix}
$ (or) $\mathbf{M}_{\text{AND}} \to
\begin{pmatrix}
\mathbf{0} & \mathbf{M}_{\text{AND}} \\
\mathbf{M}_{\text{AND}} & \mathbf{M}_{\text{AND}}
\end{pmatrix}
$

We can choose either of the matrices for OR/AND. I prefer the first matrices in both cases mostly because the first versions have determinant 1.

Having done that, we follow up with the inverse matrices, $\mathbf{N}_{\text{OR}}$ and $\mathbf{N}_{\text{AND}}$, which can also be iteratively built.

$\mathbf{N}_{\text{OR}} \to
\begin{pmatrix}
\mathbf{N}_{\text{OR}} & \mathbf{0} \\
-\mathbf{N}_{\text{OR}} & \mathbf{N}_{\text{OR}}
\end{pmatrix}
$ and   $\mathbf{N}_{\text{AND}} \to
\begin{pmatrix}
\mathbf{N}_{\text{AND}} & -\mathbf{N}_{\text{AND}} \\
\mathbf{0} & \mathbf{N}_{\text{AND}}
\end{pmatrix}
$

Now these can be used to answer questions like finding the number of sequences containing numbers from $0$ to $k$ with a given OR-sum (or AND-sum).

I think one of the reason that the Dyadic (XOR) convolution is more famous is because of their application in nim games. Unfortunately, OR and AND convolutions does not have any direct applications that I'm aware of. Still, I found these beautiful and worth sharing.

Just to conclude the story, to solve the LCM-convolution between two functions, we dirichlet-convolve each of them with the constant function. Now we simply do a pointwise multiplication and dirichlet-convolve the result with Mobius function. This is well explained in this stackexchange question and Multiplicative Arithmetic Functions of SeveralVariables: A Survey. These also discuss GCD-convolution but I don't understand them yet.

UPDATE: I always knew that I may not be the first ones to come up with the corresponding matrices for OR and AND convolution but I was a little disappointed when I found Serbanology when I was reading All the good tutorials found for Competitive programming.

Until then
Yours Aye
Me

Monday, October 14, 2019

Nim Multiplication


I was recently working on a problem that need the idea of nim-multiplication. While nim-addition plays a nice role in all 1D combinatorial games, nim-multiplication takes its place in their 2D counterparts. An excellent read on the subject of Impartial Combinatorial Games can be found in Thomas S. Ferguson's Game Theory.

Now, back to nim-multiplication. While nim-addition, denoted by $\oplus$, is easy to calculate in a computer with a simple BitXOR operation, nim-multiplication, denoted by $\otimes$, is not as simple. Every source I found in the Internet listed the following two rules to simplify nim-multiplication.

  • Rule 1: The nim-product of a Fermat power and any smaller number is just their ordinary product.
  • Rule 2: The nim-product of a Fermat power with itself is 3/2-th of the Fermat power.
These two rules, in addition to the fact that nim-product is distributive w.r.t nim-addition and the commutative property of nim-multiplication, certainly simplifies the task.

To further simplify the task, we can make use of the following two additional rules.
  • Rule 3: For any sequence of non-negative integers $p_0<p_1<p_2<\cdots$, we have $2^{2^{p_0}}\otimes2^{2^{p_1}}\otimes2^{2^{p_2}}\otimes\cdots=2^{2^{p_0}}\cdot2^{2^{p_1}}\cdot2^{2^{p_2}}\cdot\cdots$. This also shows that any power of 2 can be written as nim-product of distinct Fermat powers by writing the exponent as sums of powers-of -2.
  • Rule 4:If $x\text{ AND }y=0$, then $x\oplus y=x + y$
  • Rule 5: If $x\text{ AND }y=0$, then $2^x\otimes 2^y=2^x\cdot2^y$
Rule 3 works because we can repeatedly apply Rule 1 for nim-products starting from the left and convert them to ordinary products. Note that any resulting term on the left will always be smaller than the term that comes after.

Rule 4 works because the AND condition makes sure that no bits are identical in the binary representation of $x$ and $y$.

Rule 5 works because the nim-product can be converted to nim-products of Fermat powers. The AND condition makes sure that no two Fermat powers are equal and hence we can sort the Fermat powers and apply Rule 3.

Before solving for the nim-multiplication problem, let's just revisit some properties of the $\text{AND}$ product. It is easy that $\text{AND}$ satisfies the following three properties.

  • $x\text{ AND }x=x$
  • Associative: $x\text{ AND }(y \text{ AND }z)=(x\text{ AND }y)\text{ AND }z$
  • Commutative: $x\text{ AND }y=y\text{ AND }x$

Using these properties, we can show that

  • P1: $(x\text{ AND }y)\text{ AND }(x-(x\text{ AND }y))=0$
  • P2: $(x\text{ AND }y)\text{ AND }(y-(x\text{ AND }y))=0$
  • P3: $ (x-(x\text{ AND }y))\text{ AND }(y-(x\text{ AND }y)) = 0$

Now the nim-product of arbitrary numbers can be split into several nim-products of powers-of-2 because
$x\otimes y=(2^{x_0}\oplus2^{x_1}\oplus2^{x_2}\oplus\cdots)\otimes(2^{y_0}\oplus2^{y_1}\oplus2^{y_2}\oplus\cdots)=\displaystyle\sum_{i,j \geq 0}2^{x_i}\otimes 2^{y_j}$
where the summation symbol should be understood as a nim-sum.

Let's first solve for nim-squares of powers-of-2. That is a nim-product of a power-of-2 with itself.

\begin{align}
2^x\otimes 2^x & = (2^{x-(x\text{ AND }-x)}\cdot 2^{x\text{ AND }-x})\otimes (2^{x-(x\text{ AND }-x)}\cdot 2^{x\text{ AND }-x}) \\
& = (2^{x-(x\text{ AND }-x)}\otimes 2^{x\text{ AND }-x})\otimes (2^{x-(x\text{ AND }-x)}\otimes 2^{x\text{ AND }-x})  \\
& = (2^{x\text{ AND }-x}\otimes 2^{x\text{ AND }-x})\otimes (2^{x-(x\text{ AND }-x)}\otimes 2^{x-(x\text{ AND }-x)}) \\
& = (3/2)2^{x\text{ AND }-x}\otimes(2^{x-(x\text{ AND }-x)}\otimes 2^{x-(x\text{ AND }-x)}) \\
\end{align}
We could convert ordinary product in the second equality into an nim-product because the AND product of the exponents can be clearly seen to be zero by using  $y=-x$ in P1.

Now for the nim-product of two distinct powers-of-2,

\begin{align}
2^x\otimes 2^y & = (2^{x-(x\text{ AND }y)}\cdot 2^{x\text{ AND }y})\otimes (2^{y-(x\text{ AND }y)}\cdot 2^{x\text{ AND }y}) \\
& = (2^{x-(x\text{ AND }y)}\otimes 2^{x\text{ AND }y})\otimes (2^{y-(x\text{ AND }y)}\otimes 2^{x\text{ AND }y})  \\
& = (2^{x-(x\text{ AND }y)}\otimes 2^{y-(x\text{ AND }y)})\otimes (2^{x\text{ AND }y}\otimes 2^{x\text{ AND }y}) \\
& = 2^{x-(x\text{ AND }y)}\cdot 2^{y-(x\text{ AND }y)}\otimes (2^{x\text{ AND }y}\otimes 2^{x\text{ AND }y})\\
\end{align}
In the first equality, we have used both P1 and P2 to convert from ordinary product to nim-product. In the last equality, we have used P3 to do the opposite.

The two results above together form the recursion that can be used to solve the nim-multiplication problem. Together with the facts that the nim-product of any number with 0 is 0 and the nim-product of a number with 1 is the number itself, it's not so hard to code the above for nim-multiplication.

UPDATE: My python implementation of the above algorithm.

def memoize(fn):
    """returns a memoized version of any function that can be called
    with the same list of arguments.
    Usage: foo = memoize(foo)"""

    def handle_item(x):
        if isinstance(x, dict):
            return make_tuple(sorted(x.items()))
        elif hasattr(x, '__iter__'):
            return make_tuple(x)
        else:
            return x

    def make_tuple(L):
        return tuple(handle_item(x) for x in L)

    def foo(*args, **kwargs):
        items_cache = make_tuple(sorted(kwargs.items()))
        args_cache = make_tuple(args)
        if (args_cache, items_cache) not in foo.past_calls:
            foo.past_calls[(args_cache, items_cache)] = fn(*args,**kwargs)
        return foo.past_calls[(args_cache, items_cache)]
    foo.past_calls = {}
    foo.__name__ = 'memoized_' + fn.__name__
    return foo

@memoize
def nimmult(x, y):
  if x < y:
    return nimmult(y, x)
  elif y == 0:
    return 0
  elif y == 1:
      return x
  elif x & (x - 1):
      return nimmult((x & - x), y) ^ nimmult(x - (x & - x), y)
  elif y & (y - 1):
      return nimmult(x, (y & - y)) ^ nimmult(x, y - (y & - y))
  elif x in [2, 4, 16, 256, 65536, 4294967296]:
      return 3 * x // 2 if x == y else x * y
  elif x == y:
      tempx, p = x, 0
      while tempx > 1:
          tempx >>= 1
          p += 1
      p = p & -p
      p = 1 << p
      return nimmult(3 * p // 2, nimmult(x // p, y // p))
  else:
      tempx, tempy, px, py = x, y, 0, 0
      while tempx > 1:
          tempx >>= 1
          px += 1
      while tempy > 1:
          tempy >>= 1
          py += 1
      temp = px & py
      temp = 1 << temp
      return nimmult(x // temp * y // temp, nimmult(temp, temp))

I've included the memoize operator that I usually use in the code. In summary, Yes.. It's as simple as that :)


Until then
Yours Aye
Me 

Saturday, September 21, 2019

A simple problem in Geometric probability


I recently came across a Stackexchange post that asks for the probability that a point chosen randomly inside an equilateral triangle is closer to the center than to any of its edges. The square case was Problem B1 in 50th Putnam 1989.

I thought what about a general n-gon. Obviously for a circle, the answer should be 1/4. To tackle, the general case, we can consider the center of the polygon to be the origin and orient the polygon in such a way that one of its edges is perpendicular to the positive $y$-axis.

We can scale the polygon such that this edge passes through $(0,1)$. In other words, we scale the polygon such that the inradius is $1$. With the polygon scaled to a unit-inradius, the side of the polygon becomes $2\tan(\pi/n)$.

Now, we know that the locus of the points that are equidistant from the origin and to the line $y=1$ is a parabola. Using the formulae for distance between two points and distance between a point and a line, we have,

$x^2+y^2=\displaystyle\frac{(y-1)^2}{1^2+0^2}$

The above formulae is the same one given in this Wikipedia page. Simplifying the above we find the equation of the parabola as $y=(1-x^2)/2$.

Using the symmetry of the $n$-gon, we only focus only on the triagulation of the polygon formed by the origin and the edge that lies along the line $y=1$. The area of one such triangulation can be easily seen as $\tan(\frac{\pi}{n})$.

The lines through the origin bounding this edge are $y=\pm x \tan(\pi/2-\pi/n)=\pm x \cot(\frac{\pi}{n}) $. From now on, we restrict ourselves to the positive sign again using symmetry to our advantage.

The point of intersection of the line $y=\cot(\frac{\pi}{n})x$ and the parabola $y=(1-x^2)/2$. Solving the resulting quadratic we can simplify the positive root as $x=\tan(\frac{\pi}{2n})$. Therefore the point of intersection is $(\tan(\frac{\pi}{2n}),\tan(\frac{\pi}{2n})\cot(\frac{\pi}{n}))$.

To simplify notation, let $t=\tan(\frac{\pi}{2n})$ from now on. Therefore, the line becomes $y=\pm \frac{1-t^2}{2t}x$, the point of intersection becomes $(t,\frac{1-t^2}{2})$ and the area of the traingulation becomes $\frac{2t}{1-t^2}$.

Now, the area bounded by the parabola and the $x$-axis, is given by,

$A=\displaystyle\int_0^t \frac{1-x^2}{2}\,dx=\frac{t}{6}(3-t^2)$

and

the area of the triangle formed by origin, the point $x=t$ and the point of intersection of the parabola and the line is $T=\frac{1}{2}\cdot t \cdot \frac{1-t^2}{2}$.

In each of the triangulations of the $n$-gon, the radom point will be closer to the center than to the edge if it falls within the region bounded by the lines $y=\pm \frac{1-t^2}{2t}x$ and the parabola. If we denote the probability of such an event by $p_n$, then

$p_n=\displaystyle\frac{2(A-T)}{\tan(\frac{\pi}{n})}=\frac{(t^2+3)(1-t^2)}{12}$

Substituting for $t$ and simplifying, we have $p_n=\frac{1}{12}(4-\sec^4(\frac{\pi}{2n}))$


Until then
Yours Aye
Me

Saturday, August 17, 2019

Estimating a sum with Probability


Let's say we want to find the the sum
$S=\displaystyle\sum_{k=1}^nf(k)$
but we don't have a closed form or any easier way to calculate this. One way of doing this is to use the idea of approximating an integral.

We can take choose some $m$ random values $0=X_0,X_1,X_2,\cdots,X_m$ between $1$ and $n$ such that $0=X_0<X_1<X_2<\cdots,X_m\leq n$ and use the following expression to get an approximation.

$S_r=\displaystyle\sum_{i=1}^m f(X_i)(X_i-X_{i-1})$

But how good is this approximation? If we define the error as $\Delta=S-S_r$, what is $\mathbb{E}(\Delta)$? In this post, we are precisely trying to answer that.

Let's first calculate $\mathbb{E}(S_r)$. Using linearity of expectation, we can find the expected value of each term in the expression for $S_r$.

$\mathbb{E}(f(X_i)(X_i-X_{i-1}))=\displaystyle\frac{1}{\binom{n}{m}}\sum_{x_1=1}^{x_2-1}\sum_{x_2=1}^{x_3-1}\cdots\sum_{x_{i-1}=1}^{x_i-1}\sum_{x_i=1}^n\sum_{x_{i+1}=x_i+1}^n\cdots\sum_{x_{m-1}=x_{m-2}+1}^n\sum_{x_{m}=x_{m-1}+1}^nf(x_i)(x_i-x_{i-1})$

To simplify this sum, we need the following beautiful result.

$\displaystyle\sum_{x_1=1}^{x_2-1}\sum_{x_2=1}^{x_3-1}\cdots\sum_{x_k=1}^{x_{k+1}-1}1=\frac{(x_{k+1}-1)_{(k)}}{k!}$ and $\displaystyle\sum_{x_2=x_1+1}^n\sum_{x_3=x_2+1}^n\cdots\sum_{x_{k+1}=x_k+1}^n1=\frac{(n-x_1)_{(k)}}{k!}$

Note that how they mimic repeated integration of the constant function. Using the above results, we first simplify the expectation to

$\begin{align}
\mathbb{E}(f(X_i)(X_i-X_{i-1}))&=\displaystyle\frac{1}{\binom{n}{m}}\sum_{x_{i-1}=1}^{x_i-1}\sum_{x_i=1}^nf(x_i)(x_i-x_{i-1})\frac{(x_{i-1}-1)_{(i-2)}}{(i-2)!}\frac{(n-x_i)_{(m-i)}}{(m-i)!}\\
&=\frac{1}{\binom{n}{m}}\sum_{x_i=1}^nf(x_i)\frac{(x_i)_i}{i!}\frac{(n-x_i)_{(m-i)}}{(m-i)!}\\
&=\frac{1}{\binom{n}{m}}\sum_{k=1}^nf(k)\frac{(k)_i}{i!}\frac{(n-k)_{(m-i)}}{(m-i)!}\\
&=\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)\binom{m}{i}(k)_i(n-k)_{(m-i)}\\
\end{align}$

Therefore,
$\begin{align}
\mathbb{E}(S_r)&=\displaystyle\frac{1}{(n)_{(m)}}\sum_{i=1}^m\sum_{k=1}^nf(k)\binom{m}{i}(k)_i(n-k)_{(m)}\\
&=\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)\bigl((n)_{(m)}-(n-k)_{(m)}\bigl)\\
&=S-\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)(n-k)_{(m)}\\
\end{align}$

where we have the used the falling factorial analogue of the binomial theorem in the second equality. Now that we can have the result for the expected value of the error as,

$\begin{align}
\mathbb{E}(\Delta)&=\displaystyle\frac{1}{(n)_{(m)}}\sum_{k=1}^{n-m}f(k)(n-k)_{(m)}=\frac{1}{\binom{n}{m}}\sum_{k=1}^{n-m}f(k)\binom{n-k}{m}
\end{align}$

That's pretty nice, isn't it?

I find this random sums extremely good in certain cases. Especially, I was surprised when I tried to approximate the sum of totients and the divisor function. Considering that these number theoretic functions are themselves apparently random, the expected relative errors are surprising small. You should try them out for yourselves.

UPDATE(13 May 2020): We can also extend this method to find a trapezoidal-rule-like approximations. That is, we choose a random $m$-tuple of positive integers $(X_1,X_2,\cdots, X_m) $ such that $0=X_0<X_1<X_2<\cdots<X_m<X_{m+1}=n+1$ and find an 'approximation' using the modified sum,

$\begin{align}
\displaystyle S^*&=\frac{f(X_1)+f(X_0)}{2}(X_1-X_0)+\frac{f(X_2)+f(X_1)}{2}(X_2-X_1)+\cdots+\frac{f(X_{m+1})+f(X_m)}{2}(X_{m+1}-X_m) \\
&= \displaystyle \sum_{k=0}^m \frac{f(X_{k+1})+f(X_k)}{2}(X_{k+1}-X_k)
\end{align}$

The expected error $\delta=S-S^*$ in this is case is then,

$\mathbb{E}(\delta)=\displaystyle\frac{1}{n_{(m)}}\sum_{k=1}^{n-m}\frac{f(k)+f(n-k+1)}{2}(n-k)_{(m)}-\frac{f(0)+f(n+1)}{2}\left(\frac{n+1}{m+1}\right)$

For example, with $n=10^6$ and $m=10^3$,

  • For the Euler's totient function $\varphi(k)$, assuming $\varphi(0)=0$, $\mathbb{E}(\delta) \approx -0.0629\%$ of $S$
  • For the number of positive divisors function $\sigma_0(k)$, assuming $\sigma_0(0)=0$,  $\mathbb{E}(\delta) \approx 0.0662\%$ of $S$
  • For the sum of positive divisors function $\sigma_1(n)$, assuming $\sigma_1(0)=0$, $\mathbb{E}(\delta) \approx 0.0385\%$ of $S$
  • With $f(k)=1-1/k$, taking $f(0)=0$, $\mathbb{E}(\delta) \approx 0.0495\%$ of $S$


Until then
Yours Aye
Me

Thursday, June 6, 2019

(A family of) Integrals independent of a parameter


I recently came across a very nice wordpress page. There are a plenty of nice math in that page that I thoroughly enjoyed, especially the Distances to a line from the vertices of a regular polygon. In this post, we look at one of the results based on Integrals not being dependent on a parameter from the same wordpress page.

The author shows that

$\displaystyle \int_0^\infty f(x^2)\,dx=\int_0^\infty f((x-a/x)^2)\,dx$,    $\text{Re}(a)\leq0$

This result can be further generalized. In fact, the presence of the square term in the facilitates the proof of the result via differential under the integral sign (sadly, only via that route), but that doesn't mean we can't take a different route to the general result. Taking a cue from Problem B4 of 29th Putnam 1968, we show that,

$\displaystyle \int_{-\infty}^\infty f(x)\,dx=\int_{-\infty}^\infty f(x-a/x)\,dx$ provided one of them exists.

For example, here are the WolframAlpha results for $\sin x / x$ and $\sin(x-1/x)/(x-1/x)$. It's surprising that while Mathematica calculates the first case exactly as $\pi$ in the blink of an eye, it fails to evaluate the second.

Let $I=\displaystyle \int_{-\infty}^\infty f(x)\,dx$

Substitute $x=y-a/y$. Now comes the cute part. Note that when $y=0$, $x=-\infty$ and when $y=\infty$, $x=\infty$. Also, $dx=(1+\frac{a}{y^2})dy$

Therefore,

$\begin{align}
I&=\displaystyle \int_0^\infty f(y-a/y)\left(1+\frac{a}{y^2}\right)\,dy\\
&=\displaystyle \int_0^\infty f(y-a/y)\,dy+\displaystyle \int_0^\infty f(y-a/y)\frac{a}{y^2}\,dy\\
\end{align}$

Now, we make the substitution, $t=-\frac{a}{y}$ in the second integral. Therefore, $dt=\frac{a}{y^2}dy$. Also, $y=0,t=-\infty$ and $y=\infty,t=0$. Hence,

$\begin{align}
I&=\displaystyle \int_0^\infty f(y-a/y)\,dy+\displaystyle \int_{-\infty}^0 f(-a/t+t)\,dt\\
&=\displaystyle \int_{-\infty}^\infty f(x-a/x)\,dx
\end{align}$

which proves the result.

In any case, just like the author of the wordpress page, I too was surprised by the fact that the transformation $x \to x-a/x$ changes the shape of the curve drastically keeping the area constant. A non-calculus based proof of this result would be a feast to the senses I guess but I couldn't find it yet. See ya again with a nice post.


Until then
Yours Aye
Me