Sunday, December 24, 2023

The Rounding Error Conundrum

Like all managers, I too, from time to time, will be in a position to present a bunch of numbers in an Excel Sheet. To bring clarity to the sheet, I follow the etiquette of rounding the numbers to (MS-Excel's default) two decimal places.

The problem this brings is that when you sum of list of numbers and round everything to, say, two decimal places, the numbers doesn't always add up and someone or the other points out that the numbers are not adding up in the last decimal place. I got tired of pointing out that it is a result of rounding.

At first, I thought rounding with more decimal places might bring some resolution to this situation. But then, it became apparent (and intuitively obvious) that no matter the rounding, the problem persists. Being an amateur mathematician at heart, I wanted to solve this problem mathematically.

Let $X_1,X_2,\cdots,X_n$ be standard uniform random variables $U(0,1)$ and $Y=X_1+X_2+\cdots+X_n$. We are then interested in the probability that $[Y]=[X_1]+[X_2]+\cdots+[X_n]$ where $[x]$ denotes the value of $x$ rounded to $r$ places after the decimal point.

As I started thinking about this problem, the only progress I could make is that $r$ doesn't affect the probability and we could as well consider it to be $0$. Frustrated, I posted in my puzzle group and Atul Gupta from the group was able to quickly solve this.

The key idea is to use the periodicity of the 'round' function. That is, $[n+x]=n+[x]$ for integer $n$ and real $x$.

The way to solve this problem then is transform the $X$'s such that $X_k=B_k+Z_k$ where $B_k$'s are either $0$ or $1$ and $Z_k \sim U(-0.5,0.5)$.

Because we take $r=0$, we can see that $[Z_k]=0$ for all $k$. Therefore, $[X_k]=[B_k+Z_k]=B_k+[Z_k]=B_k$ which means $[X_1]+[X_2]+\cdots+[X_n]=B_1+B_2+\cdots+B_n=\text{(say)}B$.

On the other hand,

$\begin{align} Y&=X_1+X_2+\cdots+X_n\\ &=B_1+Z_1+B_2+Z_2+\cdots+B_n+Z_n\\ &=B+Z_1+Z_2+\cdots+Z_n \end{align}$

Using the periodicity of 'round' function, $[Y]=B+[Z_1+Z_2+\cdots+Z_n]$.

Hence, the condition we are interested in simplifies to $[Z_1+Z_2+\cdots+Z_n]=0$ which is equivalent to saying $-0.5 \leq Z_1+Z_2+\cdots+Z_n \leq 0.5$

Transforming the $Z_k$'s to standard Uniform variables $U_k=Z_k+1/2$, this becomes $(n-1)/2 \leq U_1+U_2+\cdots+U_n \leq (n+1)/2$

It is well known that the sum of $n$ standard uniform random variables defines the Irwin-Hall distribution (also known as Uniform Sum distribution) $IH_n$.

Irwin-Hall distribution and Eulerian numbers are closely related. For example, this should already be apparent from the PDF of $IH_n$ in the Wikipedia page and the integral identity of Eulerian Numbers.

While the CDF of $IH_n$ also nearly resembles Eulerian numbers, it is not quite there. To make Eulerian numbers more useful to our cause, we slightly modify the formula given in Wiki page.

$\displaystyle A(n,x)=\sum_{i=0}^{x+1}(-1)^i\binom{n+1}{i}(x+1-i)^n$

where $x$ can be a real number. With this definition, we have the following useful result.

$\displaystyle\mathbb{P}(t \leq IH_n \leq t+1)=\frac{1}{n!}A(n,t)$

Therefore, we finally have what we were looking for.

$\displaystyle \mathbb{P}([Y]=[X_1]+[X_2]+\cdots+[X_n])=\mathbb{P}\left(\frac{n-1}{2} \leq \sum_{k=1}^nU_k \leq \frac{n+1}{2}\right)=\frac{1}{n!}A(n,(n-1)/2)$

For large $n$, we can use the normal approximation to show that 

$\displaystyle \mathbb{P}([Y]=[X_1]+[X_2]+\cdots+[X_n]) \approx \sqrt{\frac{2}{\pi}}\sin\left(\sqrt{\frac{3}{n}}\right)$

which shows the probability of the sum matching only worsens as $n$ increases.

On an unrelated note, I also noticed that

$\displaystyle \mathbb{E}(IH_n|t \leq IH_n \leq t+1)=(t+1)\frac{n}{n+1}$

Given the simple expression for the expected value, I suspect there is a very nice reason for this but am yet to see it. In case you can, please do share it with us in the comments.

Hope you enjoyed this. See ya soon.

Until then, Yours Aye

Me

No comments:

Post a Comment