Assessment & Feedback

Can AI Mark Essays Fairly? What the

Research and Our Pilots Actually

Show

Published by Eduface  |  May 2026  |  9 min read

You returned 80 essays last semester. Somewhere in the back of your mind, a question

lingers: would you score them the same way if you marked them again next week?

Research suggests the honest answer is: probably not quite. If marking variability is a

problem for experienced humans, what does that mean for AI essay marking? And can an

automated system actually do this fairly?

Can AI mark essays fairly?

AI essay marking, when implemented with rubric-based methodology and mandatory

human oversight, can match or exceed the consistency of human inter-rater

agreement. In Eduface's UK pilot programmes, AI marks aligned with lecturer marks

95% of the time. Fairness depends not on whether a human or an algorithm marks

first, but on whether the criteria are clear, the process is transparent, and the final

grade remains under human control.

How reliable is human essay marking, really?

This is where the conversation needs to start. AI essay marking is routinely compared

against a gold standard that, on examination, is far from solid. Research on marking

reliability in UK higher education reveals a picture that should give every institution pause.

Bloxham (2009) reviewed marking and moderation practices across British universities

and concluded that current moderation procedures create substantial administrative

burden while adding little to actual mark accuracy.

1

The assumption that double-marking

or moderation produces reliable grades is, in her analysis, built on false premises rather

than evidence.

Sadler (2009) identified a related problem: grade integrity requires not only consistent

criteria but also that markers interpret those criteria in comparable ways.

2

In practice, two

experienced academics marking the same essay against the same rubric will frequently

arrive at different grades. Studies on general impression marking consistently show low

inter-rater correlation, with rubric-based assessment producing meaningfully higher but

still imperfect agreement.

Marking Reliability: Human vs AI

Agreement rates across marking approaches

40%

69%

95%

General impression

marking

Rubric-based

human marking

Eduface AI

alignment (UK pilots)

Agreement with reference mark

Sources: Bloxham (2009); Eduface UK pilot data (2024–2025)

Figure 1: Agreement rates across different marking approaches. General impression marking shows low inter-rater

reliability; rubric-based human marking improves consistency; Eduface AI achieves 95% alignment with lecturer

marks in UK pilots.

This does not mean human judgement is wrong. It means that marking is harder and more

variable than institutions typically acknowledge. Any serious conversation about AI essay

marking has to start with this context rather than treating human marking as a fixed,

reliable benchmark.

What does "fair" AI marking actually mean?

Fairness in assessment has several distinct dimensions. It is not simply a question of

whether an AI or a human holds the pen.

Hattie and Timperley (2007) identified that effective feedback must be accurate, timely,

and tied to clear learning goals.

3

Fairness, in that framework, requires that every student

receives feedback that genuinely helps them improve. When marking is delayed by three

to six weeks because of workload constraints, or when feedback varies sharply

depending on which marker a student happens to receive, the system is already failing

the fairness test before AI enters the room.

A fairer approach would look like this:

Clear, pre-defined rubric criteria shared with students before submission.

Consistent application of those criteria across all submissions in a cohort.

Timely, specific feedback that identifies what the student did well and what needs to

develop.

Human review and sign-off before grades are released.

AI essay marking, when properly implemented, delivers on all four points. Inconsistency

between markers disappears. Turnaround drops from weeks to days. The lecturer retains

final control over every grade.

How does AI essay marking work in practice?

The term "AI essay marking" covers a wide range of implementations, from simple

keyword detection to sophisticated language models trained on large datasets of

assessed student work. What matters for fairness is not the technology itself but the

methodology and governance around it.

Eduface AI Marking Workflow

1

Student submits

via LMS

2

AI scores against

rubric criteria

3

Lecturer reviews

and may edit

4

Grade released

with feedback

5

Student acts

on feedback

Blind mode option

Lecturer marks first without seeing AI grades.

AI grades revealed afterwards for calibration.

AI-visible mode option

AI marks shown upfront. Lecturer edits

or overrides before any grade is released.

Figure 2: Eduface's AI marking workflow. Two lecturer modes are available: blind mode (lecturer marks

independently first, AI grades revealed for calibration afterwards) and AI-visible mode (lecturer reviews and can edit

AI marks before release). The lecturer holds final authority in both modes.

Eduface supports two marking modes, both of which keep the lecturer in control. In blind

mode, the lecturer completes their own marking before Eduface reveals the AI grades.

This allows lecturers to check their own consistency and remove bias from their initial

marking. In AI-visible mode, the AI grades are shown upfront and the lecturer can edit or

override any mark before it is released to students. Institutions can set which mode is

available or mandatory for their staff, giving them governance over the process at an

institutional level.

Does Eduface's AI marking hold up against lecturer

judgement?

Pilot data from UK institutions using Eduface shows 95% alignment between AI-

generated marks and lecturer marks. This figure is derived from live assessments across

written assignments and exam questions, with marks verified against the same rubric by

both the AI and the human marker.

The 5% of cases where the AI and the lecturer diverge are precisely the cases where

human review adds the most value: edge cases, ambiguous arguments, or work that

requires contextual knowledge the rubric does not fully capture. Eduface flags these

cases for closer review rather than releasing them automatically.

Pilot finding: Across UK pilot programmes including Bath Spa University, Eduface

AI marks aligned with lecturer marks in 95% of cases. Where divergence

occurred, lecturer override took an average of under three minutes per

assignment.

Falchikov and Goldfinch (2000) conducted a meta-analysis comparing peer assessment

marks to lecturer marks and found correlations typically in the range of 0.60 to 0.80,

depending on the assessment type and training provided.

4

AI marking, trained on verified

assessor data and applied against a consistent rubric, outperforms this benchmark.

What does the EU AI Act require for AI essay marking?

The EU AI Act (Regulation 2024/1689) classifies AI systems used in educational

assessment as high-risk under Annex III, point 3(b).

5

This classification covers automated

exam scoring, student placement decisions, and evaluation of academic performance.

High-risk classification does not prohibit use: it requires compliance.

The two most relevant obligations for institutions deploying AI essay marking tools are

Article 14 (human oversight, meaning a qualified human must be able to review, override,

and bear responsibility for every consequential decision) and Article 13 (transparency,

meaning students and staff must be informed that AI is involved and understand how the

system reaches its outputs).

Eduface is designed around both requirements. Human override is built into the workflow,

not added as an afterthought. Feedback generated by the AI explains the reasoning

behind each mark, rather than delivering a score without justification. Institutions in the

UK and EU can deploy Eduface with confidence that the governance model is aligned with

regulatory expectations.

Assessment approach

Consistency

Turnaround

Feedback quality

Human oversight

EU AI Act compliant

General impression

marking (human)

Low

Variable

Variable

Yes

N/A

Rubric-based marking

(human)

Moderate

Variable

Moderate

Yes

N/A

Unreviewed AI grading

(no human step)

High

Fast

Moderate

No

No

Eduface: blind mode

High

Fast

High

Yes

Yes

Eduface: AI-visible

mode

High

Fast

High

Yes

Yes

Frequently asked questions

Can AI marking detect plagiarism or AI-generated student work?

AI marking tools such as Eduface assess the quality and content of a submitted piece of

work against a rubric. Plagiarism and AI-generated content detection are separate

functions, typically handled by dedicated tools such as Turnitin. The two systems are

complementary and should be used in conjunction, not as substitutes for each other.

Will students trust an AI-marked grade?

Student trust depends on transparency. When students are told in advance that AI

provides a first-pass mark, that a lecturer reviews every grade, and that they can request a

human review, trust levels are comparable to existing marking processes. In NSS-focused

institutions, the bigger trust issue is often delayed, generic feedback rather than who

produced it.

How does Eduface handle subjectivity in essay marking?

Eduface marks against the rubric criteria the lecturer defines. Where the rubric captures

the judgement (argument quality, use of evidence, structure), the AI applies it consistently.

Where the rubric cannot capture nuance, Eduface flags the case for closer lecturer review.

The system is designed to support, not replace, academic judgement on genuinely

complex cases.

What types of written assessment can Eduface mark?

Eduface covers written assignments (essays, case studies, reflective reports, short-

answer questions), written exam questions, and open-ended assessments. Eduface also

has a dedicated model for oral and spoken examinations. The system operates across all

major LMS platforms including Blackboard, Brightspace, Moodle, and Canvas.

Is Eduface approved for use in UK institutions?

Yes. Eduface is an approved supplier on the Jisc/CHEST framework, which means UK

institutions can procure the platform without running a separate tender process. Eduface is

also on the HEAnet framework in Ireland. All processing runs on proprietary GPU

infrastructure in the Netherlands and does not rely on third-party AI APIs such as OpenAI.

The question is not whether AI can mark fairly

The evidence from both academic research and live pilot programmes shows that rubric-

based AI assessment, with mandatory human review, produces marks that are at least as

consistent as human-to-human marking. The more accurate question is whether an

institution's current marking process is as consistent and fair as it believes. For most, the

honest answer requires examining the evidence with the same rigour they would apply to

any other quality assurance question.

See Eduface in action

Find out how Eduface fits into your institution's assessment

workflow. Request a free demo or create a free lecturer account to

try it with your own assignments.

Request a demo

References

Bloxham, S. (2009). Marking and moderation in the UK: false assumptions and wasted resources.

Assessment & Evaluation in Higher Education, 34(2), 209–220.

Sadler, D. R. (2009). Grade integrity and the representation of academic achievement.

Studies in Higher

Education, 34

(7), 807–826.

Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112.

Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysis

comparing peer and teacher marks. Review of Educational Research, 70(3), 287–322.

European Parliament and Council of the EU. (2024). Regulation (EU) 2024/1689 (Artificial Intelligence

Act). Official Journal of the European Union.

© 2026 Eduface  |  eduface.me  |  AI-powered assessment and feedback for higher education