Tuesday, February 9, 2016

Paintings and colors

In a museum there are $n$ paintings, each of which was painted with at least 3 colors. The total number of colors from all paintings is $m << n$. The average number of colors per painting is $s$.

If we choose $k$ paintings at random, what's the expected number of colors that we get? What's the maximum and minimum value of this expectation, for a fixed color-painting configuration, but for varying choices of paintings?

If we choose $l$ colors at random, what's the expected number of paintings that can be replicated with those colors? (It means each painting is painted only with those colors). Again, what's the maximum and minimum value of this expectation?

Remark: by choosing $m,n,s,k,l$ judiciously we can write problems such as: show that we can choose $k$ paintings such that the number of colors among those paintings is at most / at least $x$.


There are $N = {n \choose k}$ ways to choose paintings. For each choice of paintings, we tally up the total number of colors produced, and then we sum it over all possible choice of paintings. Let $S$ be this sum of total number of colors. The expected number of colors is then $S / N$.

Suppose the color $c$ is used by $p_c$ paintings. We wish to count the contribution of $c$ towards $S$. The color contributes one point to $S$ for each choice of paintings that includes at least one of the $p_c$, and zero otherwise. That means now the question is, how many ways can we choose $k$ out of $n$ paintings such that at least one of them comes from the $p_c$ paintings? There are $N$ ways total to choose, and there are ${n-p_c \choose k}$ ways to choose such that none of the $p_c$ paintings are chosen, so the total number is: $${n \choose k} - {n-p_c \choose k}$$ So: $$S = \sum_c {n \choose k} - {n-p_c \choose k}$$ where the sum runs over all $m$ colors. This expression is exact. What follows from this point on is an attempted to upper/lower bound $S$, and it may not be exact, although we will give examples to show that the bound is tight.

Now consider the function $f(t) = {n \choose k} - {n-t \choose k}$, defined over $t \in [1,m]$ (the color must be used in at least 1 painting, and at most $m$ paintings obviously). We are given that $m << n$ so it's safe to assume that ${n-m \choose k} > 0$. The convexity and concavity analysis depends on if $k$ is even or odd. For now, assume $k$ is odd. The analysis if $k$ is even is very similar.

The function $(n-t)(n-t-1)\dots (n-t-k+1)$ is convex in the interval $(0,n-k-1)$, and since $m << n$, it is convex in the interval $[1,m]$. That means $f(t)$ is concave. On the other hand, we know that the average number of colors per painting is $s$, so there are $sn$ distinct painting-color pairs, so $\sum_c p_c = sn$.

By Jensen, the maximum of $\sum_c f(p_c)$ is achieved if all the $p_c$ is the same, that is $sn/m$. In that case, $$\frac{S}{N} \leq \frac{\sum_c {n \choose k} - {n-p_c \choose k}}{n \choose k} = \frac{ m({n \choose k} - {n-\frac{sn}{m} \choose k}) }{n \choose k}$$ For example, for $n=50,k=3,m=25,s=10$, we have $S/N = 19.8$. So we can write a problem such as: show that we can choose 3 paintings such that between them we get at most 19 colors.

In order to find the lower bound, by majorization theorem, the minimum of $\sum_c f(p_c)$ is achieved by $[p_c] = [m,m,m,\dots,m,r,1,\dots,1]$ because that's the configuration that majorizes everything else. The number of $m$s, and $1$s and the value of $r$ (if needed) will have to depend on the particular problem setup, to make sure the following two conditions are met:

1. $\sum_c p_c = sn$ 2. The number of elements in $[p_c]$ is $m$.

For example, in the $n=50,k=3,m=25,s=10$ case above, we would have $[p_c] = [25,25,\dots,25,20,1,1,\dots,1]$ where there are 19 25s and 6 1s. Then: $$S \geq 25 \times {50 \choose 3} - 19 \times {50-25 \choose 3} - 1 \times {50-20 \choose 3} - 6 \times {50-1 \choose 3}$$ $$\frac{S}{N} \geq 16.9$$ So we can say: show that we can choose 3 paintings such that between them we get at least 17 colors.

Dancing partners

In a party with $n$ boys and $n$ girls, each boy dances with some girls and vice versa. (No pair of boy and girl dance more than once). There are $n^2-n+1$ dances that occurred throughout the night. Show that we can pair each boy with each girl so that everyone is paired with someone he/she danced with.


The key fact is because the number of dances is more than $n^2-n$. If it were exactly $n^2-n$, we could have a situation where $n-1$ boys danced with all the girls, and one boy never danced, and this effectively blocks that boy from being paired correctly.

Suppose that there is no way to pair them correctly. That is, for every given pairing, there's always a pair that did not dance. For each pairing, we define the score of said pairing to be the number of pairs that actually danced. This means a legal pairing is the one with score $n$.

Now let $S$ be the sum of all the scores of all possible pairings. Our contrapositive hypothesis is that all pairings have a score of at most $n-1$, so $S \leq (n-1)n!$ (because there are $n!$ possible pairings).

On the other hand, for each dance that happened throughout the night, we want to count how many points it contributes to $S$. Suppose the dance happens between boy $B$ and girl $G$. Then it contributes one point of score whenever the pairing pairs $B$ and $G$. In other words, there are $(n-1)!$ such pairings (because one pair is fixed and we have complete freedom to pair the other $n-1$ boys and $n-1$ girls. So the total number of score is: $$S = (n^2-n+1)(n-1)! > (n^2-n)(n-1)! = (n-1)n!$$ A contradiction.