"Can we really trust AI to mark English Literature essays?" It's the question that comes up time and again in our conversations with schools and teachers.
As such, we've performed comprehensive evaluations to demonstrate the accuracy of the Top Marks' English Literature AI marking tools really are. The results speak for themselves!
We're examining the performance on Edexcel English Literature -- specifically, the Post-1914: 40 Mark Question.
Edexcel makes available numerous exemplar essays for their exam papers and we've put our tool to the test using 78 of those very same exam board approved standardisation materials. These exemplars showcase a broad spectrum of answer quality. These exemplars are used for standardisation, showing teachers what responses at various levels look like.
We took 78 of these essays and ran them through our dedicated marking tool. Then we measured the correlation between the official marks the board awarded each essay, and the marks Top Marks AI assigned to those same essays.
We employed the Pearson correlation coefficient. In short:
What sort of correlation do experienced human markers achieve when marking essays already marked by a lead examiner?
Cambridge Assessment conducted a rigorous study to measure precisely this. 200 GCSE English scripts - which had already been marked by a chief examiner - were sent to a team of experienced human markers. These experienced markers were not told what the chief examiner had given these scripts. Nor were they shown any annotations.
The Pearson correlation coefficient between the scores these experienced examiners gave and the chief examiner was just below 0.7. This indicated a positive correlation, though far from perfect. If you are interested, you can find the study here.
The results showed Top Marks achieving a correlation of 0.88 -- an incredibly strong positive correlation that far outperforms the experienced human markers in the Cambridge study. (Top Marks AI was also not privy to the "correct marks" or any annotations).
Moreover, 71.79% of the marks we gave were within 4 marks of the grade given by the chief examiner.
Another interesting metric is the Mean Absolute Error, for which our system scored 3.13. On average, the AI differed from the board by 3.13 marks, which is comfortably within 4 marks. As a percentage, that's an average of 7.8% difference.
In contrast, in that same Cambridge study, experienced examiners marking a 40-mark question showed a Mean Absolute Error of 5.64 marks, that's a difference of 14.1%. These results highlight the exceptional accuracy of Top Marks AI compared to traditional marking practices.
We don't claim that Top Marks is infallible, but when it does get things wrong, just how bad is it? Well, let's turn to the Root Mean Square Error to find out. Root Mean Square Error (RMSE) is a measure of the severity of large errors. When you square the number 1, you still get 1, and when you square 2, you still only make a small jump to 4. But square 5, and you're suddenly all the way up at 25. That's how RMSE works - it (essentially!) highlights large errors by squaring them.
Top Marks AI's Root Mean Square Error was 4.38, meaning even when larger errors occur, they remain remarkably small relative to the 40-mark scale.
You can see the full side-by-side human and AI scores below.
| Essay ID | Board Score | Top Marks AI Score | Difference |
|---|---|---|---|
| June 2019 An Inspector Calls 2 (-) (38).pdf | 38.0 | 39.3 | +1.3 |
| June 2019 Animal Farm 1 (-) (40).pdf | 40.0 | 40.0 | +0.0 |
| June_2022 Anita and Me 1 (-) (15).pdf | 15.0 | 17.0 | +2.0 |
| June 2019 An Inspector Calls 1 (-) (31).pdf | 31.0 | 31.1 | +0.1 |
| June_2022 Coram Boys 1 (-) (22).pdf | 22.0 | 22.7 | +0.7 |
| June_2022 Boys Don't Cry 1 (-) (23).pdf | 23.0 | 24.7 | +1.7 |
| June_2022 Boys Don't Cry 1 (-) (29).pdf | 29.0 | 21.6 | -7.4 |
| June 2019 Blood Brothers 1 (-) (14).pdf | 14.0 | 17.7 | +3.7 |
| June 2019 Animal Farm 2 (-) (21).pdf | 21.0 | 25.8 | +4.8 |
| June 2019 Blood Brothers 2 (-) (30).pdf | 30.0 | 32.4 | +2.4 |
| June 2019 Hobson's Choice 2 (-) (27).pdf | 27.0 | 25.1 | -1.9 |
| June 2019 Lord of the Flies 2 (-) (16).pdf | 16.0 | 21.5 | +5.5 |
| June 2019 The Woman in Black 1 (-) (29).pdf | 29.0 | 29.2 | +0.2 |
| June 2024 Animal Farm 1 (-) (23).pdf | 23.0 | 32.0 | +9.0 |
| June 2024 Anita and Me 1 (-) (30).pdf | 30.0 | 27.1 | -2.9 |
| June 2024 Animal Farm 4 (-) (28).pdf | 28.0 | 30.3 | +2.3 |
| June 2024 Anita and Me 2 (-) (16).pdf | 16.0 | 19.1 | +3.1 |
| June 2024 Blood Brothers 1 (-) (30).pdf | 30.0 | 29.0 | -1.0 |
| June_2022_Blood_Brothers_2_(-)_(24).pdf | 24.0 | 26.4 | +2.4 |
| June 2019 Anita and Me 1 (-) (22).pdf | 22.0 | 27.2 | +5.2 |
| June_2022_Animal_Farm_2_(-)_(30).pdf | 30.0 | 33.0 | +3.0 |
| June 2019 Anita and Me 2 (-) (23).pdf | 23.0 | 27.3 | +4.3 |
| June_2022 Coram Boys 2 (-) (20).pdf | 20.0 | 25.9 | +5.9 |
| June 2019 The Woman in Black 2 (-) (40).pdf | 40.0 | 21.6 | -18.4 |
| June 2019 Lord of the Flies 1 (-) (32).pdf | 32.0 | 29.6 | -2.4 |
| June 2019 Journey's End 1 (-) (26).pdf | 26.0 | 26.5 | +0.5 |
| June 2024 Blood Brothers 2 (-) (18).pdf | 18.0 | 23.0 | +5.0 |
| June 2024 Boys Don't Cry 1 (-) (31).pdf | 31.0 | 25.5 | -5.5 |
| June_2022_Journeys_End_1_(-) (17).pdf | 17.0 | 16.3 | -0.7 |
| June_2022_Journeys_End_2_(-) (34).pdf | 34.0 | 30.4 | -3.6 |
| November_2020_Blood_Brothers_1_(-) (15).pdf | 15.0 | 18.1 | +3.1 |
| June 2024 Coram Boy 1 (-) (12).pdf | 12.0 | 15.5 | +3.5 |
| June 2024 Refugee Boy 2 (-) (24).pdf | 24.0 | 21.0 | -3.0 |
| June_2022 Anita and Me 2 (-) (40).pdf | 40.0 | 28.6 | -11.4 |
| June_2022 Lord of the Flies 1 (-) (20).pdf | 20.0 | 24.5 | +4.5 |
| June_2024_An_Inspector_Calls_2_(-) (27).pdf | 27.0 | 26.2 | -0.8 |
| June_2022 Lord of the Flies 2 (-) (26).pdf | 26.0 | 27.6 | +1.6 |
| June_2022 The Empress 1 (?) (26).pdf | 26.0 | 25.9 | -0.1 |
| June_2022 The Woman in Black 1 (-) (14).pdf | 14.0 | 10.1 | -3.9 |
| June_2022 The Woman in Black 2 (-) (38).pdf | 38.0 | 35.9 | -2.1 |
| June 2024 Animal Farm 3 (-) (13).pdf | 13.0 | 11.2 | -1.8 |
| June 2024 Hobson's Choice 1 (-) (11).pdf | 11.0 | 11.3 | +0.3 |
| June 2024 Refugee Boy 1 (-) (16).pdf | 16.0 | 14.7 | -1.3 |
| June_2024_An_Inspector_Calls_4 (-) (15).pdf | 15.0 | 16.8 | +1.8 |
| June 2024 The Empress 1 (-) (28).pdf | 28.0 | 28.1 | +0.1 |
| June 2024 Boys Don't Cry 2 (-) (14).pdf | 14.0 | 20.4 | +6.4 |
| June 2024 Woman in Black 2 (-) (40).pdf | 40.0 | 36.8 | -3.2 |
| June 2024 Coram Boy 2 (-) (19).pdf | 19.0 | 16.3 | -2.7 |
| June 2024 Woman in Black 1 (-) (35).pdf | 35.0 | 32.1 | -2.9 |
| June 2024 Lord of the Flies 2 (-) (40).pdf | 40.0 | 33.0 | -7.0 |
| June 2024 Journey's End 1 (-) (21).pdf | 21.0 | 26.0 | +5.0 |
| June 2024 Hobson's Choice 2 (-) (23).pdf | 23.0 | 24.0 | +1.0 |
| June 2024 Animal Farm 2 (-) (40).pdf | 40.0 | 35.9 | -4.1 |
| June 2024 Journey's End 2 (-) (40).pdf | 40.0 | 33.2 | -6.8 |
| November_2021_An_Inspector_Calls_1_(-) (29).pdf | 29.0 | 33.4 | +4.4 |
| November_2021_An_Inspector_Calls_2_(-) (34).pdf | 34.0 | 33.8 | -0.2 |
| June_2024_An_Inspector_Calls_1a_(-) (10).pdf | 10.0 | 8.3 | -1.7 |
| June_2022 Hobson's Choice 2 (-) (20).pdf | 20.0 | 22.5 | +2.5 |
| June 2024 The Empress 2 (-) (20).pdf | 20.0 | 20.8 | +0.8 |
| June 2019 Hobson's Choice 1 (-) (23).pdf | 23.0 | 24.4 | +1.4 |
| June_2022 Hobson's Choice 1 (-) (34).pdf | 34.0 | 33.5 | -0.5 |
| June 2024 Lord of the Flies 1 (-) (32).pdf | 32.0 | 31.3 | -0.7 |
| June_2022_An_Inspector_Calls 1 (-) (14).pdf | 14.0 | 15.0 | +1.0 |
| June_2022_Blood_Brothers_1_(-) (15).pdf | 15.0 | 17.5 | +2.5 |
| June_2022_An_Inspector_Calls_4 (-) (10).pdf | 10.0 | 8.1 | -1.9 |
| November_2021_Animal_Farm_1_(-) (25).pdf | 25.0 | 29.2 | +4.2 |
| June_2022 Refugee Boy 1 (-) (29).pdf | 29.0 | 26.6 | -2.4 |
| June_2022 Refugee Boy 2 (-) (38).pdf | 38.0 | 27.9 | -10.1 |
| June_2022_Animal_Farm_1_(-) (7).pdf | 7.0 | 8.0 | +1.0 |
| November_2020_An_Inspector_Calls_1_(-) (19).pdf | 19.0 | 18.9 | -0.1 |
| June 2019 Journey's End 2 (-) (36).pdf | 36.0 | 26.5 | -9.5 |
| June_2024_An_Inspector_Calls_5_(-) (24).pdf | 24.0 | 26.0 | +2.0 |
| June_2024_An_Inspector_Calls_1b_(-) (11).pdf | 11.0 | 11.2 | +0.2 |
| June_2024_An_Inspector_Calls_3_(-) (40).pdf | 40.0 | 38.6 | -1.4 |
| June_2022_An_Inspector_Calls_2_(-) (24).pdf | 24.0 | 31.4 | +7.4 |
| June_2022_An_Inspector_Calls_5_(-) (30).pdf | 30.0 | 31.8 | +1.8 |
| June_2024_An_Inspector_Calls_6_(-) (40).pdf | 40.0 | 38.5 | -1.5 |
| June_2022_An_Inspector_Calls_3 (-) (40).pdf | 40.0 | 40.0 | +0.0 |
Absolutely.
First, here's a scatter graph to show you what a theoretical perfect correlation of 1 would look like:
Now, let's look at the real-life graph, drawn from the data above:
On the horizontal axis, we have the grade given by the exam board. On the vertical, the grade given by Top Marks AI. The individual dots are the essays -- their position tells us both the mark given by the exam board and by Top Marks AI. You can see how closely it resembles the theoretical graph depicting perfect correlation.
Discover how Top Marks AI can revolutionise assessment in education. Contact us at info@topmarks.ai.
We use cookies to enhance your experience. By continuing to visit this site you agree to our use of cookies. Learn more in our Cookie Policy.