In the Press

Can AI Mark GCSE Papers? A State School Trial — and What the Evidence Shows

Richard Davis

15 April 2026 · 10 min read

As featured on BBC News — 7 April 2026

In April 2026, BBC News covered Wensleydale School's trial of Top Marks AI for GCSE mock marking — making it one of the first state schools in the north of England to implement the technology in a live assessment setting. Here's what the trial found, why the short-term concerns are real, and what the evidence beyond a single-school pilot actually shows.

What the Trial Involved

Wensleydale School is a state secondary in the Yorkshire Dales. Their trial, led by Deputy Head Charlie Barnett, covered three subjects with extended written answers — the kind of marking that is most time-consuming and most difficult to do consistently:

English Language and English Literature — the entire Year 11 cohort
History — two classes, one exam paper
Geography — Key Stage 4 assessments outside the mock exam period

Crucially, teachers marked alongside the AI throughout — not instead of it. The trial was designed as a structured comparison, not a handover. That framing matters, and it is how we recommend every first-term adoption be approached.

What They Found

Headteacher Julia Polley was direct about the headline result:

"We have to say that we were really impressed because AI marking gives detailed feedback to the students, which teachers can do but it takes a long time to do it."

— Julia Polley, Headteacher, Wensleydale School (BBC News)

The feedback quality was the standout finding. Detailed, assessment-objective-referenced feedback on extended written answers — the kind that takes an experienced teacher 15–20 minutes per script to produce — generated at scale without sacrificing depth.

The second finding that caught Polley's attention was objectivity. It is a problem that does not get discussed enough in AI marking debates:

"We don't want our teachers to interpret what the kids have written and give them the benefit of the doubt because they know them and they know they're trying hard. That was the bit that we were trying to unpick — AI won't see it that way."

— Julia Polley, Headteacher, Wensleydale School (BBC News)

Human markers are not neutral. They know which students try hard, which ones have improved, which ones have difficult circumstances. That knowledge is valuable in many contexts — but it introduces systematic bias into summative marks that should reflect exam board standards, not teacher relationships. An AI marker applies the mark scheme to the script in front of it, no more and no less.

Polley's framing of what the platform offers is one of the clearest we have heard from any school:

"It's like a sense check to make sure what we are saying is right with what exam boards will say."

— Julia Polley, Headteacher, Wensleydale School (BBC News)

The Honest Bit: Short-Term Overhead

Polley was equally candid about the workload picture in the short term. The initial setup — barcoding scripts, uploading question by question, reviewing AI marks alongside their own — increased rather than reduced teacher workload in the first weeks.

"Our staff were absolutely aghast to start with, saying 'but we want to mark our papers' because they want to know where their students are at."

— Julia Polley, Headteacher, Wensleydale School (BBC News)

This is an honest and familiar picture. Any school considering a trial should budget for a learning curve — typically several weeks of parallel marking before the workflow becomes systematic and the time savings materialise. Charlie Barnett's framing of what the trial was actually for captures it well:

"This was never about replacing teachers. The purpose of the trial was to explore whether AI can apply exam board standards accurately and whether it could help teachers provide more detailed feedback to students."

— Charlie Barnett, Deputy Head, Wensleydale School

Schools seeing the largest workload reductions — including those reporting a 50% reduction in marking time — are generally in their second or third term of use, once the processes are embedded and teachers have calibrated their review workflow against the AI's output.

Engaging with the Objections

The BBC piece also features Dr Theocharis Kyriacou, associate professor of AI at York St John University, who raises two concerns that deserve a direct response.

On outsourcing skill: Kyriacou argues that "completely outsourcing marking would not be a good use of AI, as it would take the skill out of the hands of teachers." We agree entirely. Top Marks AI is built as a tool that sharpens teacher judgement, not one that replaces it. The platform's ScaMP feedback framework is designed to surface mark-scheme reasoning explicitly — so that teachers and students alike can understand why a response sits in a particular band, not just that it does. The "sense check" framing Polley uses is precisely the right one.

On student distrust: Kyriacou notes discomfort among students and parents about AI marking, based on forum discussions. The NAHT's response is the right one: transparency is essential, and schools should be open about when and how AI tools are used. What the accuracy data can address is the underlying concern — that AI marks are less reliable than human ones. As we explain below, the evidence suggests the opposite is true for purpose-built platforms calibrated against board standardisation materials.

What a Single-School Trial Can't Show

A trial at one school, however positive, is a data point. What a single-school pilot cannot show is whether accuracy holds at scale, across question types, and across the full ability range. That is where published accuracy studies matter.

0.94Pearson correlation on AQA English Language — compared to ~0.70 for experienced human markers

84%of marks within exam board tolerance, versus ~45% for experienced human markers (Fowles, 2009)

400+individually calibrated tools across GCSE, A Level, and IGCSE — each benchmarked against board standardisation materials

These figures have been independently corroborated — not by Top Marks AI — but by Ark Schools, one of the UK's largest multi-academy trusts, and Community Schools Trust. Independent validation is the key distinction between a platform that claims to be accurate and one that can demonstrate it.

The comparison point matters too. The benchmark is not perfection — it is the consistency of experienced human markers working under realistic conditions. The Fowles (2009) finding that only ~45% of experienced markers place responses within exam board tolerance is not a criticism of teachers; it reflects the genuine difficulty of applying complex mark schemes consistently across large volumes of scripts. Purpose-built AI, calibrated against standardisation materials, removes that variability.

The question schools should ask is not "is AI marking perfect?" It is "is AI marking more consistent than the alternative?" The published evidence — and Wensleydale's experience — suggests the answer is yes.

What This Means for Schools Considering a Trial

Wensleydale's experience — impressive feedback quality, honest short-term overhead, a structured teacher-led review — is a realistic picture of what a first term looks like. It is not a magic wand. It is a tool that rewards a systematic approach and pays back the initial investment over time.

If your school is considering a trial, three things matter most:

Choose a platform with published accuracy data. Ask for Pearson correlations and mean absolute error benchmarked against board standardisation materials. If a provider cannot answer that question with specifics, their marks are educated guesses.
Plan for the learning curve. Budget four to six weeks of parallel marking before the workflow becomes systematic. Schools that try to skip this stage tend to abandon the trial before the savings materialise.
Use the feedback, not just the marks. The biggest gains — for students and for lesson planning — come from acting on AI-generated feedback at the class level. Marks alone are a fraction of the value.

Wensleydale's trial is one of the most visible examples of AI marking entering mainstream state-school use. It will not be the last. The schools that move earliest — and move carefully — will have a meaningful head start in embedding a tool that, used well, gives teachers more time and students better feedback.

Richard Davis

Founder & CEO, Top Marks AI

Richard read English at UCL and Cambridge before founding Accolade Press, a boutique academic publisher. He is also the author of four bestselling thriller novels.

We use cookies to enhance your experience. By continuing to visit this site you agree to our use of cookies. Learn more in our Cookie Policy.