At Paradigm Trust we have developed an assessment programme based on comparative judgement combined with multiple choice questions.
The reasoning for using this combination comes largely from Daisy Christodoulou and Evidence Based Assessment (see links at the end of this post). We find it allows us to assess knowledge and reasoning across the curriculum.
We have been using these assessments for a year now and have enough data to evaluate them. NoMoreMarking provide us with the data we need to judge the quality of our comparative judgement assessment (see here). We need something similar for our multiple choice questions.
There are two simple statistics we use: item difficulty and item discrimination.
- Item difficulty tells you how difficult the learners found the question. It is calculated by dividing the number of incorrect responses by the total number of responses. A difficulty of 1 means that no one answered the question item correctly. A difficulty of 0 means that everyone got it right.
- Item discrimination tells you how effectively the question separates the high scorers from the low scorers. A discrimination of 1 means none of the lower attainers answered the question while all of the high attainers answered correctly. Discrimination is calculated by splitting the cohort into three groups. Find the total score of the high attainers and subtract the total score of the low attainers. Divide this value by the the size of the number of learners in one of the attainment groups (i.e. a third of the total cohort).
The chart above shows that question 2 is the most difficult, but its discrimination is low. This is because only a few of the highest scoring pupils answered question 2 correctly (which is fine). Easy questions also get low discrimination because everyone gets it right (also fine).
Question 1, 9 and 15 look like great questions. They usefully discriminates between the high and low scorers. I’d take a close look at questions 3 and 6 – they are medium difficulty, but don’t discriminate well. They could be badly worded or confusing.
The ideal multiple choice assessment looks something like this:
The bulk of the questions discriminate between high and low scoring questions, which is what you need.
Another useful evaluation tool is the scalogram:
- Colour-code the correct and incorrect responses using conditional formatting.
- Sort the scores.
- Drag each column so the high difficulty questions are on the right and the low difficulty questions are on the left.
An ‘ideal’ MCQ assessment would look something like this:
The highest scoring pupils at the bottom answer all of the questions correct. The middle scoring pupils get the easier questions right and the harder questions wrong.
The real scalogram above shows a problem with question 1: the lowest scoring pupils got it right, but the medium scoring pupils got it wrong.
Questions 2, 5 and 3 would also be worth closer inspection as looking down the column there is a lot of noise.
Question 9 works well, with only a few odd responses.
Our Paradigm subject groups discuss this analysis to improve the questions for next time. We also discuss an analysis of what the MCQs tell us about our teaching. I will write a post on this next.
Also coming soon…. I describe how we have developed a more sophisticated model for assessing our multiple choice assessments based in the Rasch model.
Thank you for the feedback on my previous two posts on Exploring Comparative Judgement Data With Charts and Communicating Assessment Scores with Parents and Carers – this is fast becoming my Paradigm Trust Assessment and Reporting series.
Further reading on MCQs:
- Closed questions and higher order thinking
- Multiple choice questions, part two
- Research on multiple choice questions
and her brilliant book: Making Good Progress?: The future of Assessment for Learning
Evidence Based Assessment