Pages

Bookmark and Share

Wednesday, May 8, 2019

Questions in an exam

A test is taken by 2019 students. The test consists of $n$ questions (where $n > 0$ is even), where each question is a true/false problem worth 1 point. It is known that:

1. Each student answered exactly half of the questions correctly.

2. Given two students, their answers agreed on exactly half of the questions.

Find the smallest $n$ such that this situation is possible.

Solution

If $n = 2^k m$ where $m$ is odd, then there are at most $2^k - 1$ students. Therefore, with 2019 students, we need $2048$ questions for this situation to be possible. Proof:

Suppose $m=1$, we will give a configuration of $2^k-1$ students that satisfy both conditions. First of all, given a number $b$, let $f(b)$ denote the number of 1s in its binary representation, modulo 2. That is $f(b)=0$ if $b$ contains an even number of 1s in its binary representation, and $f(b) = 1$ otherwise.

Label the students $1$ to $2^k-1$ and consider their binary representation (from $0...01$ to $1...1$, all having exactly $k$ digits). Also label the problems $0$ to $2^k-1$ and consider their binary representation (from $0...0$ to $1...1$, all having exactly $k$ digits). We set student $s$ answered problem $p$ correctly if $f(p \wedge s) = f(s)$, where $\wedge$ denotes bitwise AND operation between $p$'s binary representation and $s$' binary representation. In other words, if the number of digits where $p$ and $s$ are both 1 in their binary representation is the same parity as the number of 1s in $s$' representation, then student $s$ answered problem $p$ correctly.

For example, student $5=0...101$ answered question $2=0...010$ correctly because $f(5 \wedge 2) = f(0...000) = 0 = f(5)$. But student $7 = 0...111$ answered question $3 = 0...011$ incorrectly because $f(7 \wedge 3) = f(111 \wedge 011) = f(011) = 0 \neq 1 = f(7)$.

From there, it's easy to see that condition 1 is satisfied. A student's correct answers depends only on the digits of $s$ that are 1, and there are $2^{f(s)}$ possible combinations of those digits. Out of those, there will be half that has even number of 1s and half that has odd number of 1s, so that student answers half the questions correctly.

To prove condition 2, consider student $s$ and $t$. Consider $r = s \oplus t$ where $\oplus$ denotes the bitwise XOR operation. In other words, $r$ is a number whose binary representation is 1 at the digits where $s$ and $t$ are different, and 0 at the digits where $s$ and $t$ are the same. The problems where $s$ and $t$'s answers agree are exactly the problems where student $r$ answered correctly. To prove that assertion, note that $f$ is distributive with respect to $\oplus$. That is $f(a \oplus b) = f(a) \oplus f(b)$. This is because if $a$ has $d_a$ number of 1s and $b$ has $d_b$ number of 1s then $a \oplus b$ has $d_a + d_b \mod 2$ number of 1s, because the digits that were both 1s and turned into zero always come in pairs.

So suppose both $s$ and $t$ get problem $p$ right, then $p \wedge r = p \wedge (s \oplus t) = (p \wedge s) \oplus (p \wedge t)$ so $f(p \wedge r) = f(p \wedge s) \oplus f(p \wedge t) = f(s) \oplus f(t) = f(s \oplus t) = f(r)$ and similarly if they both get the problem wrong. We can show that if $s$ gets problem $p$ right and $t$ gets it wrong, then $r$ gets problem $p$ wrong as well. That way, we showed that student $s$ and $t$'s answers agree exactly on the problems that student $r$ gets right, and by condition 1, this is exactly half of all the problems. So condition 2 is proven.

Now for $m > 1$, let $l = 2^k$ so $n = lm$, we simply assign student $s$ answering problem $p$ correctly if and only if student $s$ answers problem $(p \mod l)$ correctly. In essence, we are duplicating problems $0,\dots, l-1$'s answers (which were constructed above) $m$ times. Condition 1 and 2 are still satisfied.

The second part of the proof is to show that this arrangement is optimal. That is, with $n = 2^km$ there is at most $2^k-1$ students.

For $n$ to be even then $k \geq 1$. For $k = 1$ we show that there is at most 1 student. That is, condition 1 can be fulfilled easily, but condition 2 cannot be fulfilled. Indeed, WLOG we may assume that student $A$ answered questions $1,\dots,m$ correctly and $m+1,\dots,2m=n$ incorrectly. If there is a second student $B$, then suppose above questions $1,\dots,m$ he answered $t$ questions correctly, then he must answer $m-t$ questions among the second half correctly. However, that means the number of questions that $A$ and $B$ agreed on is $t + (m- (m-t)) = 2t \neq m$, contradiction. So there is at most $2^1 - 1 = 1$ student for $k = 1$.

Now we use induction on $k$. Suppose that $k > 1$ and let $h = n/2 = 2^{k-1}m$. Take student $A$, and WLOG assume that he answered questions $1, \dots, h$ correctly (i.e. the first half). For any other student $B$, suppose he answers $t$ questions correctly on the first half, so $h-t$ questions correctly from the second half. The number of questions that $A$ and $B$ agreed on is $t + (h-(h-t)) = 2t = h$, so student $B$ answered $h/2$ questions correctly on the first half. This means all students other than $A$ must answer $h/2$ questions correctly on the first half.

Also note that if we replace $B$ with $B'$ whose answers are completely opposite of $B$, then condition 1 and 2 are both still preserved. $B'$ still answered half the questions, and the questions that $B$ agreed with other students are just "flipped." However, $B$ and $B'$ cannot both coexist because obviously condition 2 would be violated.

So now suppose that there are two students $B,C$ distinct from $A$. It's possible that $B$ and $C$ agree completely on the first half (and disagree completely on the second half). We see this a lot in the construction above. In that case, we call $B$ and $C$ "friends", and we put them under one "group." Also, if $B$ and $C'$ would have been friends, we still call $B$ and $C$ friends (as in, if $B$ and $C$ disagree completely on the first half but agree completely on the second half, we still call $B,C$ friends). Note that because $C$ and $C'$ cannot both coexist, we are not claiming that $B$ can be friends with two existing students. Indeed we shall show that $B$ is friends with at most one other person.

If $B$ and $C$ are friends, and $B$ and $D$ are friends. WLOG we may assume that $B,C,D$ agree completely on the first half (otherwise replace $C$ by $C'$ or $D$ by $D'$ or both). Suppose :
1. $X$ is the set of problems that $A$ got right, $B$ got right
2. $Y$ the set of problems $A$ got right, $B$ got wrong
3. $Z$ the set of problems $A$ got wrong, $B$ got right
4. $W$ the set of problems $A$ got wrong, $B$ got wrong

So $X \cup Y$ is the first half, and $Z \cup W$ is the second half. Because $C$ agree with $B$ on the first half, he must disagree completely on the second half. That means $C$ got $X \cup W$ completely right, and $Y \cup Z$ completely wrong. But we my apply the same argument to $D$, therefore $D$ got $X \cup W$ completely right and $Y \cup Z$ completely wrong. This violates condition 2 because now $C,D$ agree everywhere. Contradiction. Therefore, each group of friendship contains at most 2 students. (Some students may not have friends, so they are in a group each by himself).

Next, suppose $B$ and $C$ are not friends. Consider the answers of $C$ in $X,Y,Z,W$ that agree with $B$. Suppose there are $t_X, t_Y, t_Z, t_W$ answers that agree with $B$ respectively. Now by condition 2, the total number of agreement between $B$ and $C$ must be $h$, so: $$t_X + t_Y + t_Z + t_W = h$$ By condition 1, the total number of answers that $C$ got right must be $h$, so: $$t_X + (|Y| - t_Y) + t_Z + (|W| - t_W) = h$$ And note that $|Y| + |W|=h$ because those are the questions that $B$ got wrong. So the second equation can be written as: $$t_X + t_Z = t_Y + t_W$$ That means $t_X + t_Z = t_Y + t_W = h/2$. But also, $C$ must agree with $A$ for $h$ questions, so: $$t_X + |Y| - t_Y + |Z| - t_Z + t_W = h/2$$ But $|Y| + |Z| = h/2$ because those are the questions that $B$ disagreed with $A$, so $t_X + t_W = t_Y + t_Z = h/2$ Solving the two equations, we have $t_X = t_Y$. Next, because $C$ must answer $h/2$ questions correctly on the first half, then $t_X + |Y|-t_Y = h/2$, so $|Y| = h/2$, which means $|X| = |Y| = |Z| = |W| = h/2$. we get $t_X = t_W = t_Z = t_Y = h/4$. Because $k>1$ and $h = 2^{k-1}m$ then $h/4$ is still an integer. Therefore $t_X + t_Y = h/2$. So the conclusion here is that if $B$ and $C$ are not friends, then $C$ agreed with $B$ on exactly $h/2$ on the first half, and $h/4$ on the second half.

Suppose there are $2^k$ students, and because $A$ is one of them, then there are $2^k-1$ "non-A" students. Because each group contains at most 2 students, then there are at least $\lceil (2^k-1)/2 \rceil = 2^{k-1}$ groups. Take any student from each group to be the "representative" of that group. Now, consider their answers only on the first half of the exam. These answers satisfy the condition for the exams with $n/2 = h$ questions. Indeed, condition 1 is satisfied because all students other than $A$ must answer $h/2$ questions correctly. Condition 2 is also satisfied because every two students who are not friends must agree exactly $h/2$ on the first half. Therefore, now we have at least $2^{k-1}$ students (excluding student $A$) with $h = 2^{k-1}m$ problems that satisfy conditions 1 and 2, violating our induction hypothesis. Contradiction. So there must be at most $2^k-1$ students in an exam with $2^km$ problems.

Finally, because $2^{10} < 2019 < 2^{11}$, then in order for this situation to happen with 2019 students, $k \geq 11$. That means the least number of problems is $2^{11}.1 = 2048$.