A guest essay by Dr James Shea on why the move to AI-augmented assessment must reshape teacher training itself — from initial teacher training and the Early Career Framework through to the NPQs — and why its real benefits only materialise at trust scale.
Ten years ago, trainee teachers were expected to plan every lesson from scratch. Today, in many MATs, that expectation would feel inefficient at best, and professionally negligent at worst. Textbooks, websites and in-school CPD all supported this model, but they didn't solve its core problem. It was a celebrated sign of professional autonomy and creative dedication on the surface, but a buckling workload underneath. Teachers were time-poor and desperate for more efficient ways to plan. Many Multi-Academy Trusts (MATs) and forward-thinking schools have moved towards centralised planning: high-quality shared resources that teachers adapt to the specific needs of their learners. This shift, when implemented well, reduced workload, ensured curricular consistency and raised standards.
Early adopter trusts are now doing the same with assessment. The familiar model of an exhausted teacher marking exam scripts in evenings and weekends is becoming as antiquated as the hand-written lesson plan. However, the true value of AI in this space is not merely the automation of formative marking; it is the unlocking of strategic, trust-wide insights that were previously difficult to emerge without a lot of human energy and time.
None of this is without risk. AI-marked assessment raises legitimate questions about reliability, bias, and alignment with exam board standards. For many schools, the concern is not whether AI can mark, but whether it can be trusted to mark at scale.
When a teacher marks a set of mock examinations, they are often buried in the immediacy of the task: correcting errors, totting up marks, and providing a singular 'next step'. Most schools do this once during KS4 and some schools have taken to doing it twice. Whilst it is powerful, it is overwhelming. And although the class level data is valuable, it is inherently limited by the sample size of one classroom. By integrating AI into internal assessment systems, we move from marking to a sophisticated form of diagnostic analytics.
Consider the Assessment Objectives (AOs) in secondary science. AO1 focuses on direct recall. In the June 2024 AQA Combined Science Biology papers, this accounted for exactly 35 per cent of the available marks (49 out of 140): remarkably close to the grade boundary of a good pass. In other words, a student who secures the recall marks alone is within reach of a grade 5. In a traditional marking cycle, a student's failure to secure these marks might be recorded simply as a low score. AI can distinguish between weaknesses in recall, cued recall, and transfer (near and far) across large datasets. With the assistance of AI-augmented marking, teachers can see, in real-time, whether their class is struggling with the retrieval of facts or the application of those facts in unfamiliar contexts compared to their peers across the trust. As Luckin (2018) argues, the power of AI lies in its ability to provide a more precise understanding of the learner's journey.
Imagine a Year 11 data drop where, within hours, a subject lead can see not just that students struggled with electrolysis, but that difficulties clustered around far-transfer questions rather than recall or routine application. The response is not a generic revision push, but a targeted curriculum adjustment shared across schools within days. This is no longer hypothetical. In April 2026, BBC News covered Wensleydale School's trial of Top Marks AI — one of the first state schools to implement AI marking in a live assessment setting. Their trial (Top Marks AI, 2026a) spanned English Language, English Literature, History, and Geography across Year 11, with teachers marking alongside the AI as a comparison rather than a handover. Headteacher Julia Polley's summary of the platform captures the trust-standardisation case clearly: "It's like a sense check to make sure what we are saying is right with what exam boards will say" (BBC News, 2026). That sense check, applied at scale across a MAT, is what AI-enabled assessment makes possible. But those benefits are not automatic.
Leadership teams will need to decide not just which tools to adopt, but at what level they are standardised: department, school, or trust. The benefits described here only fully materialise at scale — and the accuracy case for structured adoption is now well evidenced. Purpose-built platforms calibrated against board standardisation materials are beginning to match, and in some cases exceed, experienced human markers on consistency metrics. The benchmark here is not perfection, but consistency. Decades of assessment research show that even experienced human markers vary in their judgements, particularly on extended written responses. Early AI trials suggest that, when tightly calibrated to mark schemes, automated systems may reduce some of that variability (Top Marks AI, 2026a).
This shift from manual marking to strategic feedback requires more than just software; it requires a radical update to our professional infrastructure. Mollick (2024) writes about how we will be augmented by AI in our work. Sometimes AI will do some of the work and sometimes we are wholly integrated. If teachers are to be augmented by AI, our training frameworks must reflect this partnership. We must look towards the Initial Teacher Training and Early Career Framework (ITTECF) and the National Professional Qualifications (NPQs) to codify what it means to be an 'AI-literate' secondary teacher who can prepare their students for statutory GCSE and A level examinations. The statements around assessment, which support those frameworks, should now contain expectations around AI-augmented marking. We should ask difficult questions of our teacher training models. Trainees and ECTs still need to mark scripts themselves; they need to understand standards, common misconceptions and the nuance of student responses. But if AI can reliably generate first-pass marking and diagnostics, should we continue to require them to mark dozens of scripts in isolation? Or should we place greater emphasis on how well they interpret, challenge and act on assessment data?
Effective AI feedback frameworks follow two core principles: they tailor feedback to a student's current level, and they link each comment directly to evidence in the response and the relevant mark scheme (Top Marks AI, 2026b). Teachers who understand these principles can adapt AI output effectively, combining it with their knowledge of the individual pupil that no platform can replicate.
This level of standardisation could provide a safety net for trainees and early career teachers (ECTs). Instead of drowning in a sea of marking, they are presented with a diagnostic map of their students' precise examination difficulties. This allows them to focus their energy where it matters most: planning how to address those gaps in learning.
For senior leadership teams and curriculum leads, the move towards AI assessment is a strategic imperative rather than a technical one. Implementing these systems is an exercise in change management. It requires:
It is worth acknowledging that this shift carries a short-term overhead. In the first weeks, as teachers manage the upload workflow, parallel marking and review calibration, workload tends to increase rather than reduce. Schools reporting the largest time savings are typically in their second or third term of use, once processes are embedded. Leadership teams that frame adoption as a learning curve, rather than an immediate efficiency gain, retain staff confidence and see that investment pay back. The Wensleydale trial noted that some teachers were initially resistant, not out of scepticism about the technology, but because they valued the direct insight that personal marking gave them into their students' progress (BBC News, 2026). That instinct is professionally sound. The goal is not to remove it, but to give it better data to work with.
As argued by the Department for Education (2023), the goal of AI in education should be to reduce unnecessary workload while maintaining high standards and enabling more students to meet expected outcomes. Once systems are embedded, embracing AI-supported assessment and reforming teacher development from ITTECF to NPQs frees teachers to focus on what matters most: what to teach next, and how to teach it better. Within five years, the question for trusts will not be whether to adopt AI-supported assessment, but how to ensure it is used consistently enough to secure trust-wide standards. The risk is no longer adoption; it is fragmentation: a sector divided between those who have embedded AI-supported assessment and those who have not.
We use cookies to enhance your experience. By continuing to visit this site you agree to our use of cookies. Learn more in our Cookie Policy.