Compliance & Ethics

Human-in-the-Loop AI: What It Means in

Practice for Automated Assessment

What Article 14 of the EU AI Act actually requires, and how to tell whether

an AI assessment platform genuinely implements it or just claims to.

Every major AI assessment platform now advertises "human-in-the-loop" as a feature. The

phrase appears on pricing pages, in compliance statements, and in sales calls. But what it

actually means in practice varies enormously — from a lecturer who genuinely reviews

every grade before it reaches a student, to a nominal confirmation click that happens so

quickly it provides no meaningful oversight at all.

The distinction matters. The EU AI Act, which came into force in August 2024, classifies AI

assessment tools as high-risk systems and places specific legal obligations on institutions

that deploy them. "Human-in-the-loop" as a checkbox is not enough. What regulators —

and good academic governance — require is substantive human oversight: the ability to

understand, challenge, and override AI output in ways that are genuinely meaningful.

What does human-in-the-loop mean for AI assessment?

Human-in-the-loop AI assessment means a qualified lecturer reviews, can adjust, and

explicitly approves AI-generated grades and feedback before students receive them. Under

EU AI Act Article 14, this oversight must be genuine — not a rubber-stamp approval that

takes one second and bypasses the AI's output — but a meaningful review where the

educator understands the AI's recommendations and takes accountability for the final

grade.

What does "human-in-the-loop" actually mean?

The term comes from engineering, where it describes systems that incorporate human

judgment at critical decision points rather than operating fully autonomously. Applied to AI

assessment, it means the AI produces a suggested grade and feedback, and a human — in

this context, a lecturer — reviews that suggestion before it becomes the final output that

reaches the student.

At minimum, this means:

The lecturer can see every AI-suggested score and comment before release.

The lecturer can edit, override, or reject any AI output.

Student-facing grades are not released until the lecturer explicitly approves them.

The lecturer takes academic and legal responsibility for the final grade.

What it does not mean is that the AI output is shown to lecturers and then automatically

sent to students unless the lecturer actively objects. That workflow inverts the oversight

relationship: it places the burden on the human to catch AI errors rather than placing the

burden on the AI to earn human approval.

What does the EU AI Act specifically require?

The EU AI Act (Regulation 2024/1689) classifies AI systems used to evaluate, assess, and

direct students as high-risk under Annex III, point 3(b). This classification applies to AI tools

used in higher and further education. As a high-risk system, AI assessment tools must

comply with a range of obligations — including Article 14, which governs human oversight.

Article 14 sets out what "human oversight" means in legal terms for high-risk AI systems.

The system must be designed so that the person responsible for oversight can:

a

Understand the system's capabilities and limitations

Educators must have sufficient information to know when the AI's output can be trusted and

when it requires closer scrutiny. This requires transparency about how the model works and

what kinds of errors it tends to make.

b

Detect and address anomalies

The oversight workflow must make it reasonably easy to spot submissions where the AI grade

seems wrong — not just review a summary statistics dashboard, but actually interrogate

individual cases.

c

Remain aware of automation bias

Article 14 explicitly names the risk that humans may over-rely on AI output. A compliant

system must be designed to counter this tendency — not to encourage the human reviewer to

accept AI suggestions as defaults.

d

Correctly interpret the system's output

Grades and feedback produced by the AI must be explainable enough for the lecturer to

understand why a particular score was assigned — not just a black-box number.

e

Decide not to use the output

The lecturer must have the genuine practical ability to disregard the AI's output in any

particular situation — even to reject all AI suggestions for a batch if circumstances require it.

For UK institutions: The UK has implemented its own AI regulatory framework that, while

distinct from the EU AI Act, shares the principle that high-risk AI systems in sensitive contexts

require demonstrable human accountability. UK universities operating under OfS regulatory

requirements should treat EU AI Act compliance standards as a reasonable baseline.

Why "token oversight" fails the legal and ethical test

Legal scholars and regulators have already identified what they call "automation bias" as

the central challenge for human-in-the-loop AI requirements. Automation bias is the

documented tendency for humans to accept AI recommendations without critical scrutiny

— particularly when they are busy, when the AI system appears confident, and when the

cost of deviation from the AI's suggestion feels high.

Research published in 2025 by Fink in the Social Science Research Network on human

oversight under Article 14 of the EU AI Act makes the point directly: if a human reviewer

approves AI output in under a second across hundreds of submissions, regulators will treat

this as evidence that no meaningful oversight occurred. The theoretical right to override

the AI is not equivalent to the practical exercise of that right.

"If a human operator approves AI suggestions in one second across 200

submissions, this is not oversight — it is automation bias in action. Article 14

requires meaningful review, not a nominal one."

Fink, M. (2025). Human Oversight under Article 14 of the EU AI Act. SSRN.

This creates a practical design challenge for AI assessment platforms: the interface and

workflow must be designed to encourage genuine review, not to minimise the friction

between receiving AI output and forwarding it to students. This includes:

Presenting AI grades and feedback in a way that invites scrutiny rather than passive

acceptance.

Making it easy to drill down into individual submissions.

Designing the approval step as a deliberate action, not a passive default.

Providing lecturers with enough information about the AI's reasoning to assess its

reliability on a per-submission basis.

Article 14 compliance: two contrasting workflows

Non-compliant workflow

AI grades all submissions

Grades sent to students automatically

"Lecturer can object"

(burden on human to catch errors)

✗ Fails Article 14

AI output is the default; human is optional

Compliant workflow

AI grades all submissions

Grades held — lecturer reviews

Lecturer explicitly approves

Edits any scores or comments first

Grades released to students

✓ Satisfies Article 14

Left: the AI output is the default; the human is optional. Right: the human approval is required; the AI output is

a recommendation.

What does genuine human-in-the-loop oversight look like in practice?

In a well-designed AI assessment platform, the human review step is not an afterthought. It

is the point around which the entire workflow is organised. Here is what this looks like in

concrete operational terms.

The AI produces, the lecturer decides

The AI processes each submission against the rubric and returns a suggested score for

every criterion plus a written justification and feedback comment for the student. These are

suggestions, not grades. They sit in a review queue that only the lecturer can release. The

system is in a holding state until the lecturer acts.

The dashboard supports meaningful review

The lecturer sees a summary across the whole cohort: average scores per criterion, any

submissions flagged as outliers (unusually high or low), and a distribution view that makes

it easy to spot if the AI has been systematically lenient or strict. The lecturer can filter to

review only the outliers first, or work through the full batch submission by submission.

Every AI decision is explainable

For each submission, the lecturer can see not just the score but the reasoning: which

passages from the student's work the AI linked to which rubric criterion, and why a

particular score level was assigned. This is what Article 14(d) — the requirement to

correctly interpret the system's output — demands in practice. A black-box score with no

explanation does not satisfy this requirement.

Override is easy, not exceptional

The interface makes it as easy to change a score as to accept it. There is no friction

introduced to discourage adjustment. If a lecturer wants to revise every single AI

suggestion, the system supports that. The approval step requires a deliberate action —

clicking to release to students — rather than a passive default that sends grades

automatically unless interrupted.

Two grading modes, one underlying principle

One design choice that has significant implications for Article 14 compliance is whether the

AI's suggestion is shown to the lecturer before or after they have formed their own

assessment. Both approaches can be compliant, but they have different effects on

automation bias.

Mode

How it works

Automation bias risk

Best suited for

AI-

visible

mode

Lecturer sees AI scores

and feedback upfront;

reviews, adjusts, and

approves before release

Moderate — AI

suggestion is

visible but

approval is

required

High-volume cohorts where

full independent marking is

impractical; experienced

lecturers comfortable

calibrating against AI

Blind

mode

Lecturer marks

independently first; AI

score is revealed only after

the lecturer has recorded

their assessment

Low — AI cannot

anchor the

lecturer's

independent

judgment

High-stakes assessments;

institutions with strong

academic integrity concerns;

contexts where marking

consistency is being audited

Blind mode directly addresses the automation bias risk that Article 14(c) flags. Research on

anchoring effects in assessment shows that when evaluators see a suggested score before

forming their own judgment, that suggestion exerts a measurable influence — even when

evaluators believe they are making an independent decision. Blind mode structurally

prevents this by separating the independent judgment from the AI calibration step.

In Eduface: Institutions can configure which mode is used at course or assignment level. Blind

mode is the recommended setting for high-stakes summative assessments. AI-visible mode

suits formative feedback cycles where speed is the primary value and the AI's first-pass

suggestions are the main time-saving mechanism.

What should institutions look for when procuring AI assessment

tools?

The growing number of AI assessment tools on the market makes it difficult to distinguish

between platforms that genuinely implement human-in-the-loop as a workflow principle

and those that use the phrase as a marketing claim. When evaluating tools for

procurement, the following questions cut through the ambiguity.

Are grades held in a review state until the lecturer explicitly approves them — or are they released

automatically unless the lecturer intervenes?

Can the lecturer see the AI's reasoning for each score, not just the final number?

Does the platform offer a blind mode where the AI's suggestion is withheld until after the lecturer

has made their own assessment?

Is it as easy to override the AI's suggestion as to accept it, with no friction introduced to

discourage adjustment?

Does the platform provide EU AI Act compliance documentation covering Articles 13, 14, and 11?

Is student data processed in the EU, and does the vendor sign a GDPR Data Processing

Agreement?

Can the platform provide a technical description of the model's known limitations and error

patterns?

Is the tool listed on a recognised higher education procurement framework (Jisc/CHEST or

HEAnet)?

A vendor that cannot answer these questions with specific, documented responses should

be treated with caution, regardless of how the term "human-in-the-loop" appears on their

website.

Frequently asked questions

What does the EU AI Act require for AI assessment tools specifically?

The EU AI Act classifies AI systems that evaluate or assess students as high-risk under Annex

III, point 3(b). High-risk systems must comply with requirements covering technical

documentation (Article 11), transparency to users (Article 13), human oversight (Article 14), and

accuracy and robustness (Article 15). For assessment tools, Article 14 is the most operationally

significant: it requires that qualified human oversight is built into the deployment workflow, not

just available in principle. Grades cannot be treated as final until a competent human has

reviewed and approved them.

What is automation bias and why does it matter for AI assessment?

Automation bias is the well-documented tendency for humans to accept automated

recommendations without adequate critical scrutiny, particularly under time pressure. In

assessment contexts, this means lecturers may approve AI-suggested grades without genuinely

evaluating each one — which undermines the human oversight requirement and can result in

students receiving grades that reflect AI errors rather than their actual performance. Blind mode

and dashboard design that highlights outliers are practical countermeasures. Article 14(c) of the

EU AI Act explicitly requires AI systems to be designed to help users remain aware of this risk.

Does using AI assessment tools change who is responsible for the final grade?

No. Academic responsibility for grades remains with the lecturer and the institution, regardless

of whether AI assistance was used. This is explicit in EU AI Act Article 14 and in standard

academic regulations at virtually all UK and EU universities. The AI system is a tool that

produces recommendations; the lecturer takes accountability for every grade that reaches a

student. This is not a burden — it is the correct relationship between AI assistance and

professional academic judgment.

How should institutions communicate AI-assisted marking to students?

EU AI Act Article 13 requires transparency: users must be informed when they are interacting

with or receiving outputs from a high-risk AI system. For students, this means institutions should

disclose that AI assistance was used in producing feedback, while also explaining that all grades

were reviewed and approved by their lecturer. Research on student preferences shows that

framing AI feedback as 'AI-generated, reviewed and approved by your lecturer' produces

significantly better student engagement than simply labelling feedback as 'AI-generated.'

Is blind mode required for EU AI Act compliance?

Blind mode is not explicitly required by Article 14, but it is one of the most effective practical

responses to the automation bias requirement in Article 14(c). If an institution uses AI-visible

mode, it should be able to demonstrate through audit logs that lecturers are genuinely reviewing

and adjusting AI suggestions rather than accepting them automatically. Blind mode removes the

structural risk of anchoring bias entirely, which simplifies both compliance and governance.

Human oversight built in from the start

Eduface holds every grade until you explicitly approve it. Blind mode, explainable

scores, and EU AI Act compliance documentation are included as standard — not

optional extras.

Create free account

Book a demo

References

EU AI Act — Regulation (EU) 2024/1689 of the European Parliament and of the Council. Articles 11, 13, 14; Annex III

point 3(b).

Fink, M. (2025). Human Oversight under Article 14 of the EU AI Act. SSRN Working Paper. Available at:

ssrn.com/abstract=5147196

Flodén, J. (2025). Grading exams using large language models: A comparison between human and AI grading of

exams in higher education. British Educational Research Journal. doi:10.1002/berj.4069

Frontiers in Education (2025). Human-in-the-loop assessment with AI: implications for teacher education in Ibero-

American universities. doi:10.3389/feduc.2025.1710992

Hattie, J. & Timperley, H. (2007). The Power of Feedback. Review of Educational Research, 77(1), 81-112.

RegulaAI (2025). Human Oversight in AI: What 'Human-in-the-Loop' Actually Means Under EU Law. regula-ai.com

Compliance & Ethics

Human-in-the-Loop AI: What It

Means in Practice for Automated

Assessment

What Article 14 of the EU AI Act actually requires, and

how to tell whether an AI assessment platform

genuinely implements it or just claims to.

Eduface

·

9 min read

·

June 2026

Every major AI assessment platform now advertises

"human-in-the-loop" as a feature. But what it actually

means in practice varies enormously — from a

lecturer who genuinely reviews every grade before it

reaches a student, to a nominal confirmation click

that provides no meaningful oversight at all.

What does human-in-the-loop mean?

Human-in-the-loop AI assessment means a

qualified lecturer reviews, can adjust, and

explicitly approves AI-generated grades and

feedback before students receive them. Under

EU AI Act Article 14, this oversight must be

genuine — not a rubber-stamp — but a

meaningful review where the educator

understands the AI's recommendations and

takes accountability for the final grade.

What does "human-in-the-loop"

actually mean?

The term comes from engineering, where it describes

systems that incorporate human judgment at critical

decision points. Applied to AI assessment, it means

the AI produces a suggested grade and feedback,

and a lecturer reviews that suggestion before it

becomes the final output that reaches the student.

At minimum, this requires:

The lecturer can see every AI-suggested score

and comment before release.

The lecturer can edit, override, or reject any AI

output.

Student-facing grades are not released until the

lecturer explicitly approves them.

The lecturer takes academic and legal

responsibility for the final grade.

What it does not mean is that the AI output is shown

to lecturers and then automatically sent to students

unless the lecturer actively objects. That workflow

inverts the oversight relationship.

What the EU AI Act specifically requires

The EU AI Act (Regulation 2024/1689) classifies AI

systems used to evaluate, assess, and direct students

as high-risk under Annex III, point 3(b). Article 14 sets

out what "human oversight" means in legal terms.

The system must be designed so that the responsible

person can:

a

Understand capabilities and limitations

Educators must have sufficient information to know

when the AI's output can be trusted and when it

requires closer scrutiny.

b

Detect and address anomalies

The oversight workflow must make it easy to spot

individual submissions where the AI grade seems

wrong — not just review a summary dashboard.

c

Remain aware of automation bias

Article 14 explicitly names the risk that humans

may over-rely on AI output. A compliant system

must be designed to counter this tendency.

d

Correctly interpret the output

Grades and feedback must be explainable enough

for the lecturer to understand why a particular

score was assigned.

e

Decide not to use the output

The lecturer must have the genuine practical ability

to disregard the AI's output — including rejecting

an entire batch.

For UK institutions: The UK's own AI regulatory

framework shares the principle that high-risk AI in

sensitive contexts requires demonstrable human

accountability. UK universities should treat EU AI

Act Article 14 standards as a reasonable baseline.

Why "token oversight" fails the legal

test

Legal scholars and regulators have already identified

"automation bias" as the central challenge for human-

in-the-loop AI requirements. Automation bias is the

documented tendency for humans to accept AI

recommendations without critical scrutiny —

particularly when busy, when the AI system appears

confident, and when the cost of deviation feels high.

"If a human operator approves AI suggestions

in one second across 200 submissions, this is

not oversight — it is automation bias in action.

Article 14 requires meaningful review, not a

nominal one."

Fink, M. (2025). Human Oversight under Article 14 of

the EU AI Act. SSRN.

This creates a practical design challenge: the

interface must be designed to encourage genuine

review, not to minimise the friction between receiving

AI output and forwarding it to students. A single

"approve all" button that releases all AI-generated

grades in one click after a summary screen fails the

Article 14 standard.

What genuine HITL oversight looks like

in practice

The AI produces, the lecturer decides

The AI processes each submission against the rubric

and returns a suggested score for every criterion plus

a written feedback comment. These are suggestions,

not grades. They sit in a review queue that only the

lecturer can release.

The dashboard supports meaningful review

The lecturer sees a summary across the whole

cohort: average scores per criterion, any submissions

flagged as outliers (unusually high or low), and a

distribution view. The lecturer can filter to review only

the outliers first, or work through the full batch

submission by submission.

Every AI decision is explainable

For each submission, the lecturer can see not just the

score but the reasoning: which passages from the

student's work the AI linked to which rubric criterion,

and why a particular score level was assigned. This is

what Article 14(d) demands in practice. A black-box

score with no explanation does not satisfy this

requirement.

Override is easy, not exceptional

The interface makes it as easy to change a score as

to accept it. There is no friction introduced to

discourage adjustment. The approval step requires a

deliberate action — not a passive default that sends

grades automatically unless interrupted.

Two grading modes, one underlying

principle

Whether the AI's suggestion is shown to the lecturer

before or after they have formed their own

assessment has significant implications for Article 14

compliance.

AI-visible mode

Lecturer sees AI scores upfront; reviews, adjusts,

and approves before release. Automation bias risk:

Moderate.

Best for: high-volume cohorts where full independent

marking is impractical.

Blind mode

Lecturer marks independently first; AI score is

revealed only after the lecturer has recorded their

own assessment. Automation bias risk: Low.

Best for: high-stakes summative assessments and

institutions where marking consistency is being audited.

In Eduface: Institutions can configure which mode

is used at course or assignment level. Blind mode is

the recommended setting for high-stakes

summative assessments. AI-visible mode suits

formative feedback cycles where speed is the

primary value.

Procurement checklist

When evaluating AI assessment platforms, ask:

Are grades held in a review state until the lecturer

explicitly approves them — or released automatically

unless the lecturer intervenes?

Can the lecturer see the AI's reasoning for each

score, not just the final number?

Does the platform offer a blind mode where the AI's

suggestion is withheld until after the lecturer has

made their own assessment?

Is it as easy to override the AI's suggestion as to

accept it, with no friction to discourage adjustment?

Does the platform provide EU AI Act compliance

documentation covering Articles 11, 13, and 14?

Is student data processed in the EU, and does the

vendor sign a GDPR Data Processing Agreement?

Can the platform provide a technical description of

the model's known limitations and error patterns?

Is the tool listed on a recognised higher education

procurement framework (Jisc/CHEST or HEAnet)?

A vendor that cannot answer these questions with

specific, documented responses should be treated

with caution, regardless of how the term "human-in-

the-loop" appears on their website.

Frequently asked questions

What does the EU AI Act require for AI assessment

tools specifically?

The EU AI Act classifies AI systems that evaluate or

assess students as high-risk under Annex III, point

3(b). High-risk systems must comply with

requirements covering technical documentation

(Article 11), transparency to users (Article 13), human

oversight (Article 14), and accuracy and robustness

(Article 15). For assessment tools, Article 14 is the

most operationally significant: grades cannot be

treated as final until a competent human has

reviewed and approved them.

What is automation bias and why does it matter for

AI assessment?

Automation bias is the well-documented tendency

for humans to accept automated recommendations

without adequate critical scrutiny, particularly under

time pressure. In assessment contexts, this means

lecturers may approve AI-suggested grades without

genuinely evaluating each one — which undermines

the human oversight requirement. Article 14(c) of the

EU AI Act explicitly requires AI systems to be

designed to help users remain aware of this risk.

Does a 'confirm to send' button satisfy Article 14?

No — not if it functions as a single-click approval of

all AI-generated grades without meaningful

individual review. Regulators require that the human

has genuinely evaluated the AI's output, not merely

acknowledged it. An interface designed to minimise

the time spent in review does not satisfy the spirit of

Article 14, even if it technically includes a human

approval step.

What is blind mode and why does it matter?

Blind mode is a grading workflow where the lecturer

records their own assessment of a submission

before seeing the AI's suggested grade. The AI score

is revealed only after the lecturer has committed to

their independent judgment. This structurally

prevents the anchoring effect — where seeing the

AI's number first exerts a measurable influence on

the lecturer's assessment, even when the lecturer

believes they are marking independently.

Is Eduface compliant with the EU AI Act?

Yes. Eduface is built around mandatory lecturer sign-

off before any grade is released to students, which

directly addresses Article 14. Every AI-generated

grade sits in a review queue until the lecturer

explicitly approves it. The AI's reasoning is explained

per submission. Both AI-visible and blind modes are

available. Eduface processes all student data on EU-

hosted proprietary infrastructure and publishes EU AI

Act compliance documentation.

Human-in-the-loop by design

Eduface's oversight workflow is built to meet

Article 14 — not retrofitted for compliance.

Create free account

Or book a demo for institutional rollout.