Measuring and Improving The Quality of Science Writing in Schools

This post is part of a series – a symposium – on AfL. The previous posts are well referenced and the result of much thought. My contribution is more anecdotal and speculative.

Part one of the series is by by Adam Boxer here. In it he sets the context of the following posts.

Part two is by Rosalind Walker here. She discusses the nature of school science and implications for the classroom.

Part three is by Niki Kaiser here. This post explores concepts, threshold concepts, misconceptions, knowledge and understanding.

Part four is by Deep Ghataura here. It is about the validity of formative assessment.

My post is about writing in science, how assessment often distorts writing and how we might be able to improve both scientific writing in school and its assessment.

Measuring and Improving The Quality of Science Writing in Schools

After 18 years of secondary science teaching, I left secondary school to become a primary school teacher. I had a suspicion that I would learn a lot. For three of my four primary years, I taught in year 6: SATs year. I had to learn quickly about teaching reading, writing and mathematics. But I have an issue with writing.

My experience is limited to a small group of primary schools in socially disadvantaged areas where literacy and numeracy is (rightly) prioritised. On one principal’s wall was an optician’s chart reading:

All that matters is....
All that matters is….

As soon as they are able to, the pupils write extended texts every week.

But by year 6 it is often strange writing. My class wrote a scientific text about cholera (we were studying the Victorians, and we had read about cholera in Great Yarmouth). I made a model cholera bacteria out of a sock and a luminus pipe-cleaner. I modelled the style of writing I wanted – clear, short sentences using scientific terminology, low on adjectives. Instead I got:

“The iridescent flagellum of the cholera bacillus propels the vicious, spiteful, greedy microorganism to the impenetrable wall of the stomach.”

Which is odd, but it ticks several KS2 writing SATs checkboxes. Many of my pupils could write pages of this. It isn’t how scientists write.

1953 Crick and Watson paper on the Molecular Structure of Nucleic Acids.

How precise and unambiguous this is.

It is the pressure of the rubric – the KS2 writing standards – which drive the writing in odd directions. This reminds me of this excellent graphic by Greg Ashman (here):

Rubrics - Greg Ashman

Back in high school, I see a different science writing, driven by a different rubric: the 9 mark question. Different criteria – different distortions. 

Not only does rubric marking like this distort writing, it is also unreliable (see Christodoulou: Making Good Progress, chapter 4).

No rubric will ever encourage learners to develop as scientific writers – to ‘improve’ on a rubric score, the writer would need to contort the writing to tick a box. One solution is to replace the rubric with expert opinion.

Comparative judgement is an assessment process which can score scientific writing using expert judgement. When an experience science teacher is presented with two examples of student writing, she can quickly (very quickly – in as little as 15 seconds) make a reliable judgement which text is better. After repeating this process many times, the texts can be ranked and scored by an algorithm. If several judges take part, the process is even quicker and even more reliable.

And the beauty is, that all of the nuances of expert knowledge are used. A text which was well written, expressing complex ideas with clarity and precision would outscore a flashy text, pumped full of emotive language and inappropriate grammatical features.

In addition to encouraging quality science writing, comparative judgement allows examples of previously scored work to be included in future assessments. These scored pieces act as anchors, allowing the teacher to see which learners have improved and by how much. Teachers can have reliable feedback on whether their strategies for teaching writing are really working.

At Paradigm Trust, we have been using judged texts as exemplars for classroom use. Dylan Wiliam (here) recommends using exemplars instead of student-friendly rubrics to help learners understand how to improve their own writing. However, Christian Bokhove (here) correctly warns us that learners don’t always take from an exemplar what we hope they will (in his example, learners see an exemplar with subheadings – they proceed to write bad texts with subheadings). This means be careful with how we teach using exemplars – expect to help learners unpick the good from the superficial.

My colleagues at Paradigm Trust have begun to use comparative judgement to establish an expected quality of writing for each year group (1 to 11). We use Comparative Judgement to rank each piece of writing from each year group and then use our subject specialists to identify work which we agree demonstrates the expected standard. We are then able to include the specimen pieces in future assessments to act as thresholds. To make our benchmarks more meaningful, we are working with other schools locally on shared writing tasks. We hope to agree across schools which pieces to use as our ‘expected standard’ exemplars.

I would like to add a warning: the quality of the judgements depends on the expertise of the judges. Recently I organised a comparative judgement round which produced odd results. This round had three expert judges and five trainees. We were using NoMoreMarking, an online comparative judgement engine. The algorithm scores the texts, but also provides a score for the reliability of the judges. In this instance, the trainees were more reliable than the experienced teachers. It took us some time to realise that the trainees were reliable because they were all using the same surface features of the texts: presentation, spelling and subject specific vocabulary (used correctly or not). The experienced teachers appeared to be less reliable because the algorithm does not weight the judgement by teacher experience. Removing the novice teachers’ judgements revealed a very good agreement between the experts.

I found a similar issue when I was demonstrating comparative judgement to history and English specialists using a science example I had to hand. Their expertise led them to a different conclusion to mine.

Concept Cartoon Comparative Judgement
The English and history specialists preferred this text (the language structure is sophisticated and appropriate to the task).
Concept Cartoon CJ
I preferred this (she’s demonstrated causation and used the technical term ‘current’ appropriately) .

However, this potential weakness has also proven to have a positive side-effect; when an experienced subject specialist talks through her judgements with a novice or colleague with a different specialism, her implicit expertise is made explicit. The values and thought processes of a specialist is shared. This seems to be especially helpful for primary colleagues who have to be specialists in multiple subjects.

So why would you use anything else for assessment? Extended writing in science isn’t particularly helpful for identifying gaps – there are more efficient ways of doing this (see the University of York’s diagnostic questions here). Extended writing is an inefficient way of assessing the domain – the writing will typically be about one small part of the taught curriculum.

What writing does well is demonstrate a learner’s ability to select and process knowledge in a deliberate manner. When we neglect writing in science lessons, we restrict the opportunities for learners to practise explanation and argument – and this is as important as knowing stuff.



  1. I think you set the bar too high for Year 6 (age 10, Primary). I would be pleased to get the fun example you quote esp if it met “several KS2 writing SATs checkboxes”; your Crick & Watson example appears stuffy. But I appreciate your motives.

    I appreciate times have changed , when I was at secondary school science writing majored on the format of the experimental report, but that is not to say long form was exempt – I can still remember Biology write ups of reproduction (that’s how it was taught then). . Other subjects eg History emphasised a different approach to writing style. As a scientist it has taken me a long time, beyond Uni to develop my writing skills.

    The Nomoremarking software appears interesting as are Chris Wheadon’s associated Medium articles. We used online anti-plagiarism software in College which was a nightmare. Composition at that stage and for that cohort is abysmal so I applaud the emphasis on writing which I would add, graffiti style, to the poster/chart that you started out your thought provoking article with.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s