Science exams are typically examples of the difficulty model of assessment where answers are right or wrong (see Christodoulou, Making Good Progress chapter 3). Assessors can use a marking rubric to judge whether an answer reaches a threshold of ‘correctness’ and award or deny a mark.

For many answers, this will be uncontroversial. However, some answers are less wrong than others: some answers are better than wrong.

A colleague and I took a single sentence answer from an end of year exam that many students got wrong, but which weren’t uniformly wrong.

A mark-scheme is binary: right or wrong, but we wanted to know whether we could do better than that: could we rank the answers? Could we identify which answers were almost there?

The answer is yes, and pretty reliably, using comparative judgement.

Comparative judgement (CJ) is a method for ranking and assigning a score to writing by comparing two pieces at a time. All the judge needs to do it decide which of the two is better and then repeating. Typically this is done with longer texts (we will try using long answers after the summer). But I was particularly interested in seeing what happens when you compare single sentences.

You can try for yourself here.

Typically, for longer pieces of writing, you get a fairly even distribution of scores. This didn’t happen here.

CJ task #1 Better than Wrong (1)

The results show three populations of answers: right, wrong and better than wrong.


  1. For some questions, students should simply learn the approved answer. Definitions are a good example of this.
  2. Binary right/wrong answers often fail to pinpoint students’ understanding/misunderstanding.
  3. To do this process routinely for individual questions would be too time consuming, but it was a short instructive activity for teachers before discussing better ways of teaching the concept .

Thank you to all of the additional teachers who helped with the judgements.

The data is here.


One thought on “Better than Wrong

  1. Thank you for initiating this, Ben, it was an interesting exercise and did make me think about the question and what the mark scheme might be!
    We might give 1 or 2 marks for the question – depending partly on the target grade, and your bar chart reflects that.


