Saturday, May 6, 2023

Gaps between cards after a random shuffle

This post is more of a continuation of Probability on a standard deck of cards. I was randomly thinking about some probability problems when I came across The probability of cards meeting after a shuffle. The author of this paper discusses about two problems both of which is solved by simulation. This post aims at giving explicit closed form solutions to both of the said problems.

The first problem deals with the probability of finding cards with the same rank adjacent to other after a random shuffle.

Before I could even give it a thought, I just remembered the connection between this problem and AMBALLS problem in CodeChef. The following Mathematica code based on the tutorial immediately confirms the author's simulation results.

Clear["Global`*"];
(* https://discuss.codechef.com/t/amballs-editorial/1953 *)
n = 13; m = 4;
vals[x_, y_] := 0;
vals[0, 0] = 1;
Do[
    Do[
        Do[
            If[vals[x - 1, y] > 0,
                Do[
                    i = ij - j;
                    z = 1 + 4 (x - 1) - y;
                    vals[x, y - j + m - ij] += vals[x - 1, y] Binomial[y, j] Binomial[z, i] Binomial[m - 1, ij - 1];
                , {j, 0, Min[ij, y]}
                ];
            ];
        , {y, 0, 1 + 4 (x - 1)}
        ];
    , {ij, 0, m}
    ];
, {x, n}
];
vals[n, 0] * Power[Factorial[m], 13] / Factorial[m n]
N[%]
1 - %

I searched for a few other answers for a same question when I found this stack exchange post. The generating function result in the SE post took me by surprise.

Based on the post, we can see that when we have $c_j$ balls of color $j$ encoded in array $\mathbf{c}=(c_1,c_2,c_3,\cdots)$, then the number of ways $W(\mathbf{c})$ of arranging the balls linearly such that no two balls same of the same color are adjacent to each other is given by

$\displaystyle W(\mathbf{c})=\sum_{k_1=1}^{c_1} \sum_{k_2=1}^{c_2}\cdots (-1)^{c-k}\binom{k}{\mathbf{k}}\prod_i \binom{c_i-1}{k_i-1}$

where $\mathbf{k}=(k_1,k_2,k_3,\cdots)$, $k = k_1+k_2+\cdots$, $c=c_1+c_2+\cdots$ and $\binom{k}{\mathbf{k}}$ is the multinomial coefficient.

This gives us a closed form solution. I don't understand the reasoning behind the Generating function approach yet and that will be a subject of a post later. In any case, all these approaches confirm that the probability is about $94.45\%$.

We now come to the second problem of finding the probability that the Kings and Queens are separated by more than one card after a random shuffle. The author (possibly via simulations) suspects that the complementary probability is above 70% without giving a final result.

This question seemed very interesting to me and I couldn't find a clear idea with a casual Google search. Anyway, stripping the problem of labels, this questions boils down to counting the number of words with alphabets $\{a,b,c\}$ such that the $a$'s and $b$'s are atleast two $c$'s apart.

After trying multiple approaches, I realized that the best way to solve this problem is to first solve the following sub-problem: Find the number of words that can be formed with $a$ and $b$ such that we have $k$ change of 'runs'.

For example $aabbb$ has one change of run whereas both $baba$ and $aaabbbbaabbb$ both has three change of runs.

Fortunately, this can be easily solved with Generating functions. Let $\mathcal{G}$ be the language of such words. Then, it is easy to see that

$\displaystyle \mathcal{G}=\mathcal{W}_{a,b}+\mathcal{W}_{b,a}$

where $\displaystyle \mathcal{W}_{a,b}=\mathrm{SEQ}_{\geq 1}(a)\cdot \mathrm{SEQ}(t\mathrm{SEQ}_{\geq 1}(b)t\mathrm{SEQ}_{\geq 1}(a))\cdot(1+t\mathrm{SEQ}_{\geq 1}(b))$

Therefore, the generating function is given by

$\displaystyle G(a,b)=\frac{a+b+2ab-2abt}{1-a-b+ab-abt^2}$

Now $g(m,n,k)=[a^mb^nt^k]G(a,b)$ gives the number of words having $m$ a's, $n$ b's and $k$ change of runs. Note that the special case of $m=n$ is given in A152659.

But the approach above is not easily generalizable. A better way is to use the idea of Smirnov words which counts the number of words in which no adjacent letters are the same. The generating function of Smirnov words is given by

$\displaystyle S(v_1,v_2,v_3,\cdots)=\left(1- \sum_i \frac{v_i}{1+v_i}\right)^{-1}$

If we take a non-empty Smirnov word and use the substitution $v_i \mapsto t v_i/(1-v_i)$, we get the generating function we need with one extra $t$ (which counts the zeroth run). Therefore, we have

$\displaystyle G(v_1,v_2,v_3,\cdots,t)=\sum_i \frac{v_i}{1-v_i+tv_i}\cdot\left(1-t\sum_i \frac{v_i}{1-v_i+tv_i}\right)^{-1}$

We can use the above to solve our problem in greater generality. Let's $g_p(m,n,r)$ be the number of words with $m$ a's, $n$ b's and $r$ c's such that there are atleast $p$ c's between the a's and b's. Then,

$\displaystyle g_p(m,n,r)=\sum_{k=1}^{m+n-1}g(m,n,k)\binom{r-pk+m+n+1-1}{m+n+1-1}=\sum_{k=1}^{m+n-1}g(m,n,k)\binom{r-pk+m+n}{m+n}$

That is, we first arrange the a's and b's such that there are $k$ change of runs. In each of these $k$ changes, we insert $p$ c's. We then have '$r-pk$' c's which we insert in the available $m+n+1$ spaces between the a's and b's.

Therefore,

$\displaystyle \mathbb{P}(\text{Ks and Qs separated by atleast two cards})=\frac{g_2(4,4,44)4!4!44!}{52!}\approx 26.4\%$

This approach also answer's our question from the previous post. We have,

$\displaystyle \mathbb{P}(\text{No Ks and Qs adjacent})=\frac{g_1(4,4,44)4!4!44!}{52!}\approx 51.37\%$

The following Mathematica code was used to calculate the above values.

Clear["Global`*"];
f[a_, b_] := a / (1 - a) (1 + t b / (1 - b)) / (1 - t b / (1 - b) * t a / (1 - a));
g = f[a, b] + f[b, a];
k = 4; q = 4; c = 44;
(* SeriesCoefficient[g, {a, 0, k}, {b, 0, q}, {t, 0, c}] *)
Ways[m0_, n0_, t0_] := Ways[m0, n0, t0] = Module[{m = m0, n = n0, t = t0},
    If[t <= 0, Return[Boole[Xor[m <= 0, n <= 0]]];];
    If[t <= 1, Return[2 Boole[And[m > 0, n > 0]]];];
    If[Or[m <= 0, n <= 0], Return[0];];
    Ways[m - 1, n, t] + Ways[m, n - 1, t] - Ways[m - 1, n - 1, t] + Ways[m - 1, n - 1, t - 2]
];
Sum[Ways[k, q, j] Binomial[c - 2 * j + k + q, k + q], {j, k + q - 1}]
% * Factorial[4] * Factorial[4] * Factorial[44] / Factorial[52]
N[%]
1 - %
Sum[Binomial[44 + 1, m] Binomial[4 - 1, m - 1] Binomial[4 + 44 - m, 4], {m, 4}]
% * Factorial[4] * Factorial[4] * Factorial[44] / Factorial[52]
N[%]
1 - %
Clear["Global`*"];
Factor[(a / (1 - a + a t) + b / (1 - b + b t)) / (1 - t (a / (1 - a + a t) + b / (1 - b + b t)))]

Finally, a more direct generating function would be the following.

$\displaystyle G^*(v_1,v_2,\cdots,c)=\frac{1}{1-c}\cdot G\left(\frac{v_1}{1-c},\frac{v_2}{1-c},\cdots,c^p\right)$

Clear["Global`*"];
f = a / (1 - c - a + a c c) + b / (1 - c - b + b c c);
f = f / (1 - c c f) / (1 - c);
SeriesCoefficient[f, {a, 0, 4}, {b, 0, 4}, {c, 0, 44}]

Until then
Yours Aye
Me