Assessment & Feedback
Can AI Mark Essays Fairly? What the
Research and Our Pilots Actually
Show
Published by Eduface | May 2026 | 9 min read
You returned 80 essays last semester. Somewhere in the back of your mind, a question
lingers: would you score them the same way if you marked them again next week?
Research suggests the honest answer is: probably not quite. If marking variability is a
problem for experienced humans, what does that mean for AI essay marking? And can an
automated system actually do this fairly?
Can AI mark essays fairly?
AI essay marking, when implemented with rubric-based methodology and mandatory
human oversight, can match or exceed the consistency of human inter-rater
agreement. In Eduface's UK pilot programmes, AI marks aligned with lecturer marks
95% of the time. Fairness depends not on whether a human or an algorithm marks
first, but on whether the criteria are clear, the process is transparent, and the final
grade remains under human control.
How reliable is human essay marking, really?
This is where the conversation needs to start. AI essay marking is routinely compared
against a gold standard that, on examination, is far from solid. Research on marking
reliability in UK higher education reveals a picture that should give every institution pause.
Bloxham (2009) reviewed marking and moderation practices across British universities
and concluded that current moderation procedures create substantial administrative
burden while adding little to actual mark accuracy.
1
The assumption that double-marking
or moderation produces reliable grades is, in her analysis, built on false premises rather
than evidence.
Sadler (2009) identified a related problem: grade integrity requires not only consistent
criteria but also that markers interpret those criteria in comparable ways.
2
In practice, two
experienced academics marking the same essay against the same rubric will frequently
arrive at different grades. Studies on general impression marking consistently show low
inter-rater correlation, with rubric-based assessment producing meaningfully higher but
still imperfect agreement.
Marking Reliability: Human vs AI
Agreement rates across marking approaches
40%
69%
95%
General impression
marking
Rubric-based
human marking
Eduface AI
alignment (UK pilots)
Agreement with reference mark
Sources: Bloxham (2009); Eduface UK pilot data (2024–2025)
Figure 1: Agreement rates across different marking approaches. General impression marking shows low inter-rater
reliability; rubric-based human marking improves consistency; Eduface AI achieves 95% alignment with lecturer
marks in UK pilots.
This does not mean human judgement is wrong. It means that marking is harder and more
variable than institutions typically acknowledge. Any serious conversation about AI essay
marking has to start with this context rather than treating human marking as a fixed,
reliable benchmark.
What does "fair" AI marking actually mean?
Fairness in assessment has several distinct dimensions. It is not simply a question of
whether an AI or a human holds the pen.
Hattie and Timperley (2007) identified that effective feedback must be accurate, timely,
and tied to clear learning goals.
3
Fairness, in that framework, requires that every student
receives feedback that genuinely helps them improve. When marking is delayed by three
to six weeks because of workload constraints, or when feedback varies sharply
depending on which marker a student happens to receive, the system is already failing
the fairness test before AI enters the room.
A fairer approach would look like this:
Clear, pre-defined rubric criteria shared with students before submission.
Consistent application of those criteria across all submissions in a cohort.
Timely, specific feedback that identifies what the student did well and what needs to
develop.
Human review and sign-off before grades are released.
AI essay marking, when properly implemented, delivers on all four points. Inconsistency
between markers disappears. Turnaround drops from weeks to days. The lecturer retains
final control over every grade.
How does AI essay marking work in practice?
The term "AI essay marking" covers a wide range of implementations, from simple
keyword detection to sophisticated language models trained on large datasets of
assessed student work. What matters for fairness is not the technology itself but the
methodology and governance around it.
Eduface AI Marking Workflow
1
Student submits
via LMS
2
AI scores against
rubric criteria
3
Lecturer reviews
and may edit
4
Grade released
with feedback
5
Student acts
on feedback
Blind mode option
Lecturer marks first without seeing AI grades.
AI grades revealed afterwards for calibration.
AI-visible mode option
AI marks shown upfront. Lecturer edits
or overrides before any grade is released.
Figure 2: Eduface's AI marking workflow. Two lecturer modes are available: blind mode (lecturer marks
independently first, AI grades revealed for calibration afterwards) and AI-visible mode (lecturer reviews and can edit
AI marks before release). The lecturer holds final authority in both modes.
Eduface supports two marking modes, both of which keep the lecturer in control. In blind
mode, the lecturer completes their own marking before Eduface reveals the AI grades.
This allows lecturers to check their own consistency and remove bias from their initial
marking. In AI-visible mode, the AI grades are shown upfront and the lecturer can edit or
override any mark before it is released to students. Institutions can set which mode is
available or mandatory for their staff, giving them governance over the process at an
institutional level.
Does Eduface's AI marking hold up against lecturer
judgement?
Pilot data from UK institutions using Eduface shows 95% alignment between AI-
generated marks and lecturer marks. This figure is derived from live assessments across
written assignments and exam questions, with marks verified against the same rubric by
both the AI and the human marker.
The 5% of cases where the AI and the lecturer diverge are precisely the cases where
human review adds the most value: edge cases, ambiguous arguments, or work that
requires contextual knowledge the rubric does not fully capture. Eduface flags these
cases for closer review rather than releasing them automatically.
Pilot finding: Across UK pilot programmes including Bath Spa University, Eduface
AI marks aligned with lecturer marks in 95% of cases. Where divergence
occurred, lecturer override took an average of under three minutes per
assignment.
Falchikov and Goldfinch (2000) conducted a meta-analysis comparing peer assessment
marks to lecturer marks and found correlations typically in the range of 0.60 to 0.80,
depending on the assessment type and training provided.
4
AI marking, trained on verified
assessor data and applied against a consistent rubric, outperforms this benchmark.
What does the EU AI Act require for AI essay marking?
The EU AI Act (Regulation 2024/1689) classifies AI systems used in educational
assessment as high-risk under Annex III, point 3(b).
5
This classification covers automated
exam scoring, student placement decisions, and evaluation of academic performance.
High-risk classification does not prohibit use: it requires compliance.
The two most relevant obligations for institutions deploying AI essay marking tools are
Article 14 (human oversight, meaning a qualified human must be able to review, override,
and bear responsibility for every consequential decision) and Article 13 (transparency,
meaning students and staff must be informed that AI is involved and understand how the
system reaches its outputs).
Eduface is designed around both requirements. Human override is built into the workflow,
not added as an afterthought. Feedback generated by the AI explains the reasoning
behind each mark, rather than delivering a score without justification. Institutions in the
UK and EU can deploy Eduface with confidence that the governance model is aligned with
regulatory expectations.
Assessment approach
Consistency
Turnaround
Feedback quality
Human oversight
EU AI Act compliant
General impression
marking (human)
Low
Variable
Variable
Yes
N/A
Rubric-based marking
(human)
Moderate
Variable
Moderate
Yes
N/A
Unreviewed AI grading
(no human step)
High
Fast
Moderate
No
No
Eduface: blind mode
High
Fast
High
Yes
Yes
Eduface: AI-visible
mode
High
Fast
High
Yes
Yes
Frequently asked questions
Can AI marking detect plagiarism or AI-generated student work?
AI marking tools such as Eduface assess the quality and content of a submitted piece of
work against a rubric. Plagiarism and AI-generated content detection are separate
functions, typically handled by dedicated tools such as Turnitin. The two systems are
complementary and should be used in conjunction, not as substitutes for each other.
Will students trust an AI-marked grade?
Student trust depends on transparency. When students are told in advance that AI
provides a first-pass mark, that a lecturer reviews every grade, and that they can request a
human review, trust levels are comparable to existing marking processes. In NSS-focused
institutions, the bigger trust issue is often delayed, generic feedback rather than who
produced it.
How does Eduface handle subjectivity in essay marking?
Eduface marks against the rubric criteria the lecturer defines. Where the rubric captures
the judgement (argument quality, use of evidence, structure), the AI applies it consistently.
Where the rubric cannot capture nuance, Eduface flags the case for closer lecturer review.
The system is designed to support, not replace, academic judgement on genuinely
complex cases.
What types of written assessment can Eduface mark?
Eduface covers written assignments (essays, case studies, reflective reports, short-
answer questions), written exam questions, and open-ended assessments. Eduface also
has a dedicated model for oral and spoken examinations. The system operates across all
major LMS platforms including Blackboard, Brightspace, Moodle, and Canvas.
Is Eduface approved for use in UK institutions?
Yes. Eduface is an approved supplier on the Jisc/CHEST framework, which means UK
institutions can procure the platform without running a separate tender process. Eduface is
also on the HEAnet framework in Ireland. All processing runs on proprietary GPU
infrastructure in the Netherlands and does not rely on third-party AI APIs such as OpenAI.
The question is not whether AI can mark fairly
The evidence from both academic research and live pilot programmes shows that rubric-
based AI assessment, with mandatory human review, produces marks that are at least as
consistent as human-to-human marking. The more accurate question is whether an
institution's current marking process is as consistent and fair as it believes. For most, the
honest answer requires examining the evidence with the same rigour they would apply to
any other quality assurance question.
See Eduface in action
Find out how Eduface fits into your institution's assessment
workflow. Request a free demo or create a free lecturer account to
try it with your own assignments.
Request a demo
References
Bloxham, S. (2009). Marking and moderation in the UK: false assumptions and wasted resources.
Assessment & Evaluation in Higher Education, 34(2), 209–220.
Sadler, D. R. (2009). Grade integrity and the representation of academic achievement.
Studies in Higher
Education, 34
(7), 807–826.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.
Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysis
comparing peer and teacher marks. Review of Educational Research, 70(3), 287–322.
European Parliament and Council of the EU. (2024). Regulation (EU) 2024/1689 (Artificial Intelligence
Act). Official Journal of the European Union.
© 2026 Eduface | eduface.me | AI-powered assessment and feedback for higher education