Richard Davis, CEO: 50-essay analysis demonstrates Top Marks AI achieving a Pearson correlation coefficient of 0.9, outperforming experienced human markers

AI Marking for Edexcel IGCSE English: Achieving Unprecedented Accuracy

Richard Davis, CEO: 50-essay analysis demonstrates Top Marks AI achieving a Pearson correlation coefficient of 0.9, outperforming experienced human markers, January 2, 2025

GCSE English Marking with Top Marks AI: A Deep Dive into Our Latest Study

When a large secondary school approached us to trial Top Marks AI on their Year 11 Shakespeare mocks, we welcomed the opportunity: could our system support their experienced teachers by reliably aligning with their marking of AQA English Literature Paper 1 Section A responses? Working with the department, we conducted a rigorous trial comparing our AI against four experienced markers, including both department teachers and an external assessor.

The findings? Remarkable consistency and strong agreement with both internal and external human markers. Even more impressively, these were authentic handwritten mock exam responses - just as students produce in their GCSEs - which Top Marks AI both transcribed and marked with high accuracy, demonstrating a true end-to-end solution for English departments.

Graphical Comparison of Average Marks

The Study: Humans vs. AI in Essay Marking

The study involved:

  • • 30 handwritten GCSE Shakespeare essays graded by four human markers and Top Marks AI.
  • • Utilisation of our batch marking, handwritten script tool, allowing multiple essays to be uploaded in a single document.
  • • Assessment across two components:

    • a) AO1-3: Main assessment criteria focusing on understanding, language/form/structure analysis, and historical context.
    • b) AO4: Secondary assessment criteria emphasising spelling, punctuation, and grammar (SPaG).
  • • Agreement defined as:

    • For main assessment (AO1-3): marks falling within ±3 marks of each other for the combined total across all three assessment objectives
    • For SPaG (marked out of 4): marks must be identical to be considered in agreement

Unpacking the Data: A Detailed Analysis

Overall Performance

Main Marks:

  • • Average mark per essay according to the human markers: 13.0
  • • Average mark per essay according to Top Marks AI: 13.7

SPaG:

  • • When taking the average of human markers for each essay and summing these averages: 77 total SPaG marks
  • • When summing AI SPaG marks across all essays: 76 total SPaG marks
  • • Therefore, average SPaG mark per essay (human markers): 2.6 (77/30)
  • • Average SPaG mark per essay (AI): 2.5 (76/30)

Conclusion: The alignment between Top Marks AI and human consensus is striking, with average differences of only 0.7 marks for main assessment and 0.1 marks for SPaG.

Essay-by-Essay Breakdown

To provide a clearer picture, here's the granular data comparing human markers and Top Marks AI:

Key Insight: The AI's marks fall within a close range of the human averages, demonstrating consistency and reliability.

Handling Handwritten Scripts: The Top Marks AI Advantage

One of the standout features of this study is that all essays were handwritten, simulating the typical format in which students submit their work during exams. Top Marks AI successfully:

  • • Transcribed all handwritten essays with high accuracy, converting them into digital text ready for analysis.
  • • Processed multiple essays simultaneously using our batch marking, handwritten script tool, enhancing efficiency.
  • • Eliminated the need for manual data entry, reducing the potential for human error and saving valuable time.

Key Benefit: This capability highlights Top Marks AI as a comprehensive, end-to-end solution that seamlessly integrates into existing educational workflows.

Agreement Rates: A Closer Look

Agreement in AO1-3 (Main Assessment Criteria)

Agreement Rates Bar Chart

  • Human Marker 1:
    • • 100% agreement with Human Markers 2 and 3.
    • • 83% agreement with Human Marker 4.
    • • 93% agreement with Top Marks AI.
  • Human Marker 2:
    • • 100% agreement with Human Marker 3.
    • • 75% agreement with Human Marker 4.
    • • 73% agreement with Top Marks AI.
  • Human Marker 3:
    • • 66% agreement with Human Marker 4.
    • • 80% agreement with Top Marks AI.
  • Human Marker 4:
    • • 92% agreement with Top Marks AI.

Important Context: While Human Markers 1-3 were teachers familiar with these students, Human Marker 4 was an external assessor with no prior knowledge of the students or their typical performance. The strong correlation between Top Marks AI and Marker 4's assessments (92% agreement) suggests that AI assessment, like external marking, may help eliminate unconscious interpersonal bias from the marking process.

Key Insight: Top Marks AI shows strong consistency with all human markers, particularly excelling in agreement with Human Markers 1 and 4. The notably high agreement with Marker 4 (92%) suggests that the AI system is successfully replicating the objective, impartial assessment style of an external examiner while maintaining strong correlation with experienced teachers' judgments.

Agreement in AO4 (Secondary Assessment Criteria)

AO4 Agreement Heatmap

  • Human Marker 1:
    • • 57% agreement with Human Marker 2.
    • • 40% agreement with Human Marker 3.
    • • 83% agreement with Human Marker 4.
    • • 80% agreement with Top Marks AI.
  • Human Marker 2:
    • • 70% agreement with Human Marker 3.
    • • 50% agreement with Human Marker 4.
    • • 50% agreement with Top Marks AI.
  • Human Marker 3:
    • • 33% agreement with Human Marker 4.
    • • 50% agreement with Top Marks AI.
  • Human Marker 4:
    • • 83% agreement with Top Marks AI.

Key Insight: Despite significant variation among human markers in AO4, Top Marks AI maintains a high level of agreement, matching 83% with Human Marker 4.

What Does This Mean for Educators?

An End-to-End Solution

  • • Handwriting Recognition: Top Marks AI can accurately transcribe handwritten essays, eliminating the need for manual typing or scanning errors.
  • • Batch Processing: Our system allows for multiple essays to be uploaded and processed simultaneously, saving time and resources.
  • • Seamless Integration: Designed to fit into the existing workflows of schools and educators, accommodating the traditional formats students use.

Consistency and Reliability

  • • High Agreement Rates: Top Marks AI's agreement rates with human markers are impressively high, especially in the main assessment criteria.
  • • Reduced Subjectivity: The AI mitigates the inherent subjectivity found among different human markers, ensuring fair and consistent grading for all students.

Efficiency and Time Savings

  • • Faster Turnaround: Top Marks AI can grade essays in a fraction of the time it takes human markers, freeing up valuable time for teachers to focus on personalised instruction.
  • • Scalability: Ideal for handling large volumes of essays without compromising on accuracy.

Enhanced Feedback

  • • Detailed Insights: Provides comprehensive feedback on each essay, highlighting strengths and areas for improvement.
  • • Supports Learning: Helps students understand their performance in both content and SPaG, guiding them towards academic growth.

Visualising the Impact

Graphical Comparison of Average Marks

Graphical Comparison of Average Marks

The graph illustrates the close alignment between the average marks awarded by human markers and Top Marks AI, showcasing the AI's ability to mirror human judgment accurately.

Embracing the Future of Education with Top Marks AI

The data speaks volumes: Top Marks AI is not just a tool but a transformative partner in education. By aligning closely with human markers while offering unparalleled efficiency and consistency, it stands as a beacon for the future of educational assessment.

Why Choose Top Marks AI?

  • • End-to-End Solution: From transcribing handwritten essays to providing detailed marks and feedback.
  • • Accuracy: Matches human expertise with high agreement rates.
  • • Efficiency: Saves countless hours in grading, allowing teachers to invest time where it matters most.
  • • Consistency: Eliminates discrepancies and biases in marking.
  • • Comprehensive Feedback: Delivers detailed, actionable insights for students.

Experience Top Marks AI Today

We invite schools and departments to step into the future of education. While this study focused on AQA GCSE English Literature, Top Marks AI supports assessment across multiple exam boards and qualifications - from GCSE English Language and Literature to History, Religious Studies, Geography, and A-Level subjects, as well as International Baccalaureate. Let our AI enhance your teaching, support your students, and streamline your assessment process.

Conclusion: A New Era of Educational Assessment

The integration of AI in education is no longer a distant future but a present reality that offers tangible benefits. Top Marks AI stands at the forefront of this evolution, providing reliable, efficient, and consistent marking solutions that align with human expertise across a growing range of humanities and essay-based subjects.

Embrace the change. Enhance your teaching. Empower your students.

For more information, contact us directly at info@topmarks.ai.