Saturday, August 17, 2019

Estimating a sum with Probability


Let's say we want to find the the sum
$S=\displaystyle\sum_{k=1}^nf(k)$
but we don't have a closed form or any easier way to calculate this. One way of doing this is to use the idea of approximating an integral.

We can take choose some $m$ random values $0=X_0,X_1,X_2,\cdots,X_m$ between $1$ and $n$ such that $0=X_0<X_1<X_2<\cdots,X_m\leq n$ and use the following expression to get an approximation.

$S_r=\displaystyle\sum_{i=1}^m f(X_i)(X_i-X_{i-1})$

But how good is this approximation? If we define the error as $\Delta=S-S_r$, what is $\mathbb{E}(\Delta)$? In this post, we are precisely trying to answer that.

Let's first calculate $\mathbb{E}(S_r)$. Using linearity of expectation, we can find the expected value of each term in the expression for $S_r$.

$\mathbb{E}(f(X_i)(X_i-X_{i-1}))=\displaystyle\frac{1}{\binom{n}{m}}\sum_{x_1=1}^{x_2-1}\sum_{x_2=1}^{x_3-1}\cdots\sum_{x_{i-1}=1}^{x_i-1}\sum_{x_i=1}^n\sum_{x_{i+1}=x_i+1}^n\cdots\sum_{x_{m-1}=x_{m-2}+1}^n\sum_{x_{m}=x_{m-1}+1}^nf(x_i)(x_i-x_{i-1})$

To simplify this sum, we need the following beautiful result.

$\displaystyle\sum_{x_1=1}^{x_2-1}\sum_{x_2=1}^{x_3-1}\cdots\sum_{x_k=1}^{x_{k+1}-1}1=\frac{(x_{k+1}-1)_{(k)}}{k!}$ and $\displaystyle\sum_{x_2=x_1+1}^n\sum_{x_3=x_2+1}^n\cdots\sum_{x_{k+1}=x_k+1}^n1=\frac{(n-x_1)_{(k)}}{k!}$

Note that how they mimic repeated integration of the constant function. Using the above results, we first simplify the expectation to

$\begin{align}
\mathbb{E}(f(X_i)(X_i-X_{i-1}))&=\displaystyle\frac{1}{\binom{n}{m}}\sum_{x_{i-1}=1}^{x_i-1}\sum_{x_i=1}^nf(x_i)(x_i-x_{i-1})\frac{(x_{i-1}-1)_{(i-2)}}{(i-2)!}\frac{(n-x_i)_{(m-i)}}{(m-i)!}\\
&=\frac{1}{\binom{n}{m}}\sum_{x_i=1}^nf(x_i)\frac{(x_i)_i}{i!}\frac{(n-x_i)_{(m-i)}}{(m-i)!}\\
&=\frac{1}{\binom{n}{m}}\sum_{k=1}^nf(k)\frac{(k)_i}{i!}\frac{(n-k)_{(m-i)}}{(m-i)!}\\
&=\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)\binom{m}{i}(k)_i(n-k)_{(m-i)}\\
\end{align}$

Therefore,
$\begin{align}
\mathbb{E}(S_r)&=\displaystyle\frac{1}{(n)_{(m)}}\sum_{i=1}^m\sum_{k=1}^nf(k)\binom{m}{i}(k)_i(n-k)_{(m)}\\
&=\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)\bigl((n)_{(m)}-(n-k)_{(m)}\bigl)\\
&=S-\frac{1}{(n)_{(m)}}\sum_{k=1}^nf(k)(n-k)_{(m)}\\
\end{align}$

where we have the used the falling factorial analogue of the binomial theorem in the second equality. Now that we can have the result for the expected value of the error as,

$\begin{align}
\mathbb{E}(\Delta)&=\displaystyle\frac{1}{(n)_{(m)}}\sum_{k=1}^{n-m}f(k)(n-k)_{(m)}=\frac{1}{\binom{n}{m}}\sum_{k=1}^{n-m}f(k)\binom{n-k}{m}
\end{align}$

That's pretty nice, isn't it?

I find this random sums extremely good in certain cases. Especially, I was surprised when I tried to approximate the sum of totients and the divisor function. Considering that these number theoretic functions are themselves apparently random, the expected relative errors are surprising small. You should try them out for yourselves.

UPDATE(13 May 2020): We can also extend this method to find a trapezoidal-rule-like approximations. That is, we choose a random $m$-tuple of positive integers $(X_1,X_2,\cdots, X_m) $ such that $0=X_0<X_1<X_2<\cdots<X_m<X_{m+1}=n+1$ and find an 'approximation' using the modified sum,

$\begin{align}
\displaystyle S^*&=\frac{f(X_1)+f(X_0)}{2}(X_1-X_0)+\frac{f(X_2)+f(X_1)}{2}(X_2-X_1)+\cdots+\frac{f(X_{m+1})+f(X_m)}{2}(X_{m+1}-X_m) \\
&= \displaystyle \sum_{k=0}^m \frac{f(X_{k+1})+f(X_k)}{2}(X_{k+1}-X_k)
\end{align}$

The expected error $\delta=S-S^*$ in this is case is then,

$\mathbb{E}(\delta)=\displaystyle\frac{1}{n_{(m)}}\sum_{k=1}^{n-m}\frac{f(k)+f(n-k+1)}{2}(n-k)_{(m)}-\frac{f(0)+f(n+1)}{2}\left(\frac{n+1}{m+1}\right)$

For example, with $n=10^6$ and $m=10^3$,

  • For the Euler's totient function $\varphi(k)$, assuming $\varphi(0)=0$, $\mathbb{E}(\delta) \approx -0.0629\%$ of $S$
  • For the number of positive divisors function $\sigma_0(k)$, assuming $\sigma_0(0)=0$,  $\mathbb{E}(\delta) \approx 0.0662\%$ of $S$
  • For the sum of positive divisors function $\sigma_1(n)$, assuming $\sigma_1(0)=0$, $\mathbb{E}(\delta) \approx 0.0385\%$ of $S$
  • With $f(k)=1-1/k$, taking $f(0)=0$, $\mathbb{E}(\delta) \approx 0.0495\%$ of $S$


Until then
Yours Aye
Me