If we choose $k$ paintings at random, what's the expected number of colors that we get? What's the maximum and minimum value of this expectation, for a fixed color-painting configuration, but for varying choices of paintings?

If we choose $l$ colors at random, what's the expected number of paintings that can be replicated with those colors? (It means each painting is painted only with those colors). Again, what's the maximum and minimum value of this expectation?

Remark: by choosing $m,n,s,k,l$ judiciously we can write problems such as: show that we can choose $k$ paintings such that the number of colors among those paintings is at most / at least $x$.

**Solution**

There are $N = {n \choose k}$ ways to choose paintings. For each choice of paintings, we tally up the total number of colors produced, and then we sum it over all possible choice of paintings. Let $S$ be this sum of total number of colors. The expected number of colors is then $S / N$.

Suppose the color $c$ is used by $p_c$ paintings. We wish to count the contribution of $c$ towards $S$. The color contributes one point to $S$ for each choice of paintings that includes at least one of the $p_c$, and zero otherwise. That means now the question is, how many ways can we choose $k$ out of $n$ paintings such that at least one of them comes from the $p_c$ paintings? There are $N$ ways total to choose, and there are ${n-p_c \choose k}$ ways to choose such that none of the $p_c$ paintings are chosen, so the total number is: $${n \choose k} - {n-p_c \choose k}$$ So: $$S = \sum_c {n \choose k} - {n-p_c \choose k}$$ where the sum runs over all $m$ colors. This expression is exact. What follows from this point on is an attempted to upper/lower bound $S$, and it may not be exact, although we will give examples to show that the bound is tight.

Now consider the function $f(t) = {n \choose k} - {n-t \choose k}$, defined over $t \in [1,m]$ (the color must be used in at least 1 painting, and at most $m$ paintings obviously). We are given that $m << n$ so it's safe to assume that ${n-m \choose k} > 0$. The convexity and concavity analysis depends on if $k$ is even or odd. For now, assume $k$ is odd. The analysis if $k$ is even is very similar.

The function $(n-t)(n-t-1)\dots (n-t-k+1)$ is convex in the interval $(0,n-k-1)$, and since $m << n$, it is convex in the interval $[1,m]$. That means $f(t)$ is concave. On the other hand, we know that the average number of colors per painting is $s$, so there are $sn$ distinct painting-color pairs, so $\sum_c p_c = sn$.

By Jensen, the maximum of $\sum_c f(p_c)$ is achieved if all the $p_c$ is the same, that is $sn/m$. In that case, $$\frac{S}{N} \leq \frac{\sum_c {n \choose k} - {n-p_c \choose k}}{n \choose k} = \frac{ m({n \choose k} - {n-\frac{sn}{m} \choose k}) }{n \choose k}$$ For example, for $n=50,k=3,m=25,s=10$, we have $S/N = 19.8$. So we can write a problem such as: show that we can choose 3 paintings such that between them we get at most 19 colors.

In order to find the lower bound, by majorization theorem, the minimum of $\sum_c f(p_c)$ is achieved by $[p_c] = [m,m,m,\dots,m,r,1,\dots,1]$ because that's the configuration that majorizes everything else. The number of $m$s, and $1$s and the value of $r$ (if needed) will have to depend on the particular problem setup, to make sure the following two conditions are met:

1. $\sum_c p_c = sn$ 2. The number of elements in $[p_c]$ is $m$.

For example, in the $n=50,k=3,m=25,s=10$ case above, we would have $[p_c] = [25,25,\dots,25,20,1,1,\dots,1]$ where there are 19 25s and 6 1s. Then: $$S \geq 25 \times {50 \choose 3} - 19 \times {50-25 \choose 3} - 1 \times {50-20 \choose 3} - 6 \times {50-1 \choose 3}$$ $$\frac{S}{N} \geq 16.9$$ So we can say: show that we can choose 3 paintings such that between them we get at least 17 colors.