Pages

Bookmark and Share

Tuesday, March 1, 2022

Number of contiguous blocks containing black balls over all permutations

$k$ black balls and $n-k$ white balls are arranged on a straight line, where $n \geq 1$, $0 \leq k \leq n$. The score of the said arrangement is the number of contiguous subsequences that contain at least one black ball. For example, the arrangement $W W B W B$ has two subsequences of length 1, and three subsequences of length 2 that contain at least one black ball. Of course there are subsequences of length 3 or higher as well, so the total score for this arrangement is 11.

What is the sum of the score of all $\binom{n}{k}$ arrangements? You may express this quantity in a recursive formula computable in $O(n^2)$ time, $O(n \log n)$ time, or $O(n)$ time.

Solution 1

We give a recursion formula computable in $O(n^2)$ time with DP.

More to come...

Solution 2A

We prove that the number is $\binom{n+2}{k+2} . \binom{k+1}{2}$ which is computable in $O(n)$ time simply by calculating the factorial table

First let $T_n$ be the $n$-th triangular number, $T_n = n(n+1)/2 = \binom{n+1}{2}$. And we also will use the following well-known identity a lot: $$\sum_{i=1}^{n} \binom{i+p-1}{p} = \binom{n+p}{p+1}, p \geq 0$$ It can be proven many ways, either with induction on $n$ keeping $p$ fixed, induction on $p$ keeping $n$ fixed, or telescoping sum, or using combinatorial arguments (choosing $p+1$ items, and condition on the first item chosen). For $p = 0$ it is a trivial sum, and for $p = 1$ it is the definition of triangular numbers.

Note that if all balls were black, then every single contiguous blocks would contribute to the score. There are $1 + 2 + \dots + n = T_n$ such blocks and that is the score. But if there are white balls, then we want to calculate how many scores are lost due to these pockets of white balls, over the course of all different arrangements.

Now given a particular arrangement of $k$ black balls and $n-k$ white balls, we come with an associated sequence of $k+1$ integers denoting the number of white balls in between black balls, including before the first black ball and after the last black ball. For example, the sequence WWBWWWBBBW corresponds to the sequence $(2, 3, 0, 0, 1)$. Each element in this sequence must be non-negative and they must add up to $n-k$, and there must be exactly $k+1$ integers.

Lemma

Suppose $A$ is the set of all sequences of non-negative integers with length $m \geq 2$ that sum up to $M$, that is $$ A = \{ (a_1, \dots, a_m) | a_i \geq 0, a_1 + \dots + a_m = M \}$$ then:

1. There are $|A| = \binom{M+m-1}{m-1}$ possible sequences.

2. If $m \geq 2$, the number $t \in [0,M]$ occurs $m \binom{M-t+m-2}{m-2}$ times in $A$, in any position.

We prove by induction. For $m=2$, the possible sequences are $A = \{(M,0), (M-1,1,), \dots, (0,M)\}$ so $|A| = M+1$ and there are two occurrences of each number.

Now suppose it holds for $m$, now for $m+1$, consider the first element $a_1 \in [0,M]$. Once its value is chosen, then we have $m-1$ numbers that need to sum up to $M - a_1$. So there are $\binom{M-a_1+m-2}{m-2}$ ways to do this, by induction hypothesis. The total number of ways is the sum of that expression when $a_1$ ranges from $0$ to $M$. $$\sum_{a_1 = 0}^M \binom{M-a_1+m-2}{m-2} = \sum_{i=m-2}^{M+m-2} \binom{i}{m-2} = \sum_{j=1}^{M+1} \binom{j+(m-2)-1}{m-2} = \binom{M+m-1}{m-1}$$ The first two equalities are change of variables, but the last one utilizes the identity above. We have proved the first assertion. For the second assertion, consider the case where the first element $a_1 = t$. The rest of the $m-1$ elements must sum up to $M-t$, so there are $\binom{M-t+m-2}{m-2}$ ways to complete the sequence. However, the number $t$ may appear in any of the other positions as well, so in total there are $m$ times that amount.

One might ask that sometimes the number $t$ itself my appear as a duplicate in multiple elements. For example for $m=4$ and $M=11$, we have have a sequence $(2,2,5,2)$ which should only be counted once. Indeed, when we consider $a_1=1$, we contributed one count towards the total number of "2" appearing. Likewise, when we consider $a_2=2,a_4=2$ we also added one count each time, but the sequence itself contributes three counts towards the appearance of "2." So the total amount is exact and correct, and there is no need to compensate for double counting. Thus the lemma is proven

Alternatively, one could also prove the first assertion by using a combinatorial argument. Consider $M+m-1$ spaces and we are to choose $m-1$ spaces to serve as separators. The spaces between the separators will be the numbers in the sequence.

Now observe that each arrangement corresponds one-to-one with a sequence like we described above. The process to convert from an arrangement to a sequence is invertible. Therefore, the total number of contiguous blocks in all arrangements is $$ T_n \binom{(n-k)+(k+1)-1}{(k+1)-1} = T_n \binom{n}{k} $$ But a lot of these blocks would contribute zero towards the total score. That is, for each sequence $(a_1, \dots, a_{k+1})$, the total loss of scores is the number of contiguous blocks that are contained fully within those pockets of white balls, that is: $$T_{a_1} + \dots + T_{a_{k+1}}$$ So we have to add these over all the possible sequences whose sum is $n-k$. This is where we utilize the lemma. Each time the number $t$ appears, it contributes $T_t$ towards the total loss of score. But thanks to the lemma, we know that the number $t$ appears exactly $$(k+1) \binom{(n-k)-t+k-1}{k-1} = (k+1) \binom{n-t-1}{k-1}$$ times. So the total loss of scores is the sum of this quantity as $t$ ranges from one to $n-k$. (Technically $t=0$ is a valid scenario but it incurs no loss of score because $T_0 = 0$ so we ignore it). $$\sum_{t=1}^{n-k} (k+1) \binom{n-t-1}{k-1} T_t $$ At this point, this sum can be computed in $O(n \log n)$ times using FFT technique, as it's a sum of terms expressed in $t$ and $n-t$, but we can simply the terms further to arrive at a closed form solution, by repeated use of the identity above. Let $B_i = \binom{i+k-2}{k-1}$ (where $k > 0$ is fixed), and for any $m > 0$: $$ B_1 + \dots + B_m = \sum_{i=1}^{m} \binom{i+k-2}{k-1} = \binom{m+k-1}{k}$$ by direct application. Then: $$ \sum_{i=1}^{m} (m+1-i). \binom{i+k-2}{k-1} = m.B_1 + (m-1).B_2 + \dots + 1.B_{m} = B_1 + (B_1+B_2) + \dots + (B_1 + \dots + B_m) = \sum_{i=1}^{m} \binom{i+k-1}{k} = \binom{m+k}{k+1}$$ One more time: $$ \sum_{i=1}^{m} T_{m+1-i}. \binom{i+k-2}{k-1} = T_m.B_1 + T_{m-1}.B_2 + \dots + 3.B_{m-1} + 1.B_{m} $$ $$ = [1+2+ \dots + m]B_1 + [1+2+ \dots + (m-1)]B_2 + \dots + [1+2]B_{m-1} + 1.B_m$$ $$ = 1.B_1 + (2.B_1 + 1.B_2) + (3B_1 + 2B_2 + B_3) + \dots + (m.B_1 + \dots + 1.B_m)$$ $$ =\sum_{i=1}^m \binom{i+k}{k+1} = \binom{m+k+1}{k+2}$$ Now, the total loss of scores becomes: $$ \sum_{t=1}^{n-k} (k+1) \binom{n-t-1}{k-1} T_t = (k+1) \sum_{i=1}^{n-k} T_{n-k+1-i} \binom{i+k-2}{k-1}$$ due to change of variable. Now we apply the latest identity above, with $m=n-k$: $$ = (k+1) \sum_{i=1}^{n-k} T_{n-k+1-i} B_i = (k+1) \binom{n+1}{k+2}$$

So the total score is then: $$ T_n \binom{n}{k} - (k+1) \binom{n+1}{k+2} = \frac{n(n+1) n!}{2. k!(n-k)!} - \frac{(k+1)(n+1)!}{(k+2)!(n-k-1)!} = \frac{(n+1)!}{k!(n-k-1)!} \left(\frac{n}{2(n-k)} - \frac{(k+1)}{(k+2)(k+1)} \frac{n-k}{n-k} \right)$$ $$ \frac{(n+1)!}{k!(n-k)!} \left(\frac{n}{2} - \frac{n-k}{k+2} \right) = \frac{(n+1)!}{k!(n-k)!} \frac{kn+2n-2n+2k}{2(k+2)} = \frac{(n+1)!}{k!(n-k)!} \frac{k(n+2)}{2(k+2)}.\frac{k+1}{k+1} = \frac{(n+2)!}{(k+2)!(n-k)!} \frac{k(k+1)}{2} = \binom{n+2}{k+2} . \binom{k+1}{2}$$

Solution 2B

We also prove that the number is $\binom{n+2}{k+2} . \binom{k+1}{2}$ but using a different (simpler) combinatorial argument.

Given $n+2$ tokens in row, mark $k+2$ of these as blue and $n-k$ as red. There are $\binom{n+2}{k+2}$ ways to choose. And now, among the blue tokens, further mark 2 tokens as "start" and "end", with the condition that "start" must come before "end", and there must be at least one blue token between them. There are $\binom{k+2}{2} - (k+1) = \binom{k+1}{2}$ ways to choose. So the end result is an arrangement of red and blue tokens, and two tokens marked start and end containing at least a blue token in between. We will show that this arrangement corresponds to exactly one score-producing contiguous block in a particular arrangement of black and white balls, and that the mapping is invertible and thus we obtain a bijection.

Arrange the balls according to the $n$ unmarked tokens, where all $n-k$ red tokens correspond to white balls, and $k$ unmarked blue tokens correspond to black balls. The contiguous block is represented by the tokens between start and end, including all red and blue tokens. Because the start and end must flank at least one blue token, then this contiguous block contributes one point towards the score.

On the reverse, given an arrangement of balls, create the corresponding arrangement of red and blue tokens. And then insert the start token to the space right before the first ball in the contiguous block, and the end token to the space right after the last ball. Thus we have a bijection, and the number of total score is $\binom{n+2}{k+2} . \binom{k+1}{2}$.

Solution 3

We present a different combinatorial argument, this time in the permutation space, not a combination space.

First we label all the balls from $1,\dots,n$ such that all the black balls are labeled $1,\dots,k$ and the white balls are labeled $k+1, \dots, n$. Consider any given permutation of the balls, and define the score of this permutation the same way as the problem statement: the number of contiguous blocks that contain at least one black balls. Then we also consider the total number of scores of all $n!$ permutations.

Note that each arrangement of unlabeled balls corresponds to $k! (n-k)!$ permutations of labeled balls, all having the same score. So the total number of scores of all $n!$ permutations is $k! (n-k)!$ times the total score of all arrangements. We will show that the total number of scores of all permutations is $\frac{(n+2)!}{2} \frac{k}{k+2}$.

First we arrange the items $1,\dots,n, X, Y$ in a row, with the condition that $X$ happens before $Y$. There are $\frac{(n+2)!}{2}$ ways to do so. This permutation corresponds to permutation of balls, and the contiguous block is represented by the items between $X$ and $Y$. There may be some permutations where $X$ and $Y$ are next to each other, which means the contiguous block is empty, but that's okay for now. Because we have conditioned that $X$ happens before $Y$, the mapping is injective, meaning no two item permutation map to the same (ball permutation, contiguous block) tuple.

However, not all of those permutations will produce a score. In order for it to produce a score, there must be at least one of $1,\dots,k$ between $X$ and $Y$. So now we consider the permutations of these $k+2$ items. Each of the $n+2$-item permutations has a unique corresponding $k+2$-item permutation, but every $(n-k)!$ of the large permutations produces the same smaller permutation.

There are $\frac{(k+2)!}{2}$ smaller permutations because they still all have $X$ before $Y$. We want to exclude the ones where $X$ and $Y$ are together in this permutation (even if in the original permutations there may be numbers larger than $k$ between them). There are $(k+1)!$ such permutations. So the remaining permutations are: $$\frac{(k+2)!}{2} - (k+1)! = \frac{(k+2 - 2}{2} (k+1)! = \frac{k}{2} (k+1)! = \frac{k}{k+2} \frac{(k+2)!}{2} $$ In other words, the good permutations are only $k/(k+2)$ of the total number of permutations, when we consider all small permutations. But since all large permutations map to the small permutations evenly ($(n-k)!$ to one), then the number of good large permutations is $\frac{(n+2)!}{2} \frac{k}{k+2}$. Then, we map it back to the space of ball arrangement, where each permutation corresponds to $k!(n-k)$ arrangements, so the total number of score of arrangements is: $$\frac{(n+2)!}{2} \frac{k}{k+2} . \frac{1}{k!(n-k)!} .\frac{k+1}{k+1} = \frac{(n+2)!}{(n-k)!(k+2)!} \frac{(k+1)k}{2} = \binom{n+2}{k+2} \binom{k+1}{2}$$

No comments:

Post a Comment