
Compliance & Ethics
Human-in-the-Loop AI: What It Means in
Practice for Automated Assessment
What Article 14 of the EU AI Act actually requires, and how to tell whether
an AI assessment platform genuinely implements it or just claims to.
Every major AI assessment platform now advertises "human-in-the-loop" as a feature. The
phrase appears on pricing pages, in compliance statements, and in sales calls. But what it
actually means in practice varies enormously — from a lecturer who genuinely reviews
every grade before it reaches a student, to a nominal confirmation click that happens so
quickly it provides no meaningful oversight at all.
The distinction matters. The EU AI Act, which came into force in August 2024, classifies AI
assessment tools as high-risk systems and places specific legal obligations on institutions
that deploy them. "Human-in-the-loop" as a checkbox is not enough. What regulators —
and good academic governance — require is substantive human oversight: the ability to
understand, challenge, and override AI output in ways that are genuinely meaningful.
What does human-in-the-loop mean for AI assessment?
Human-in-the-loop AI assessment means a qualified lecturer reviews, can adjust, and
explicitly approves AI-generated grades and feedback before students receive them. Under
EU AI Act Article 14, this oversight must be genuine — not a rubber-stamp approval that
takes one second and bypasses the AI's output — but a meaningful review where the
educator understands the AI's recommendations and takes accountability for the final
grade.
What does "human-in-the-loop" actually mean?
The term comes from engineering, where it describes systems that incorporate human
judgment at critical decision points rather than operating fully autonomously. Applied to AI
assessment, it means the AI produces a suggested grade and feedback, and a human — in
this context, a lecturer — reviews that suggestion before it becomes the final output that
reaches the student.
At minimum, this means:
The lecturer can see every AI-suggested score and comment before release.
The lecturer can edit, override, or reject any AI output.
Student-facing grades are not released until the lecturer explicitly approves them.
The lecturer takes academic and legal responsibility for the final grade.
What it does not mean is that the AI output is shown to lecturers and then automatically
sent to students unless the lecturer actively objects. That workflow inverts the oversight
relationship: it places the burden on the human to catch AI errors rather than placing the
burden on the AI to earn human approval.
What does the EU AI Act specifically require?
The EU AI Act (Regulation 2024/1689) classifies AI systems used to evaluate, assess, and
direct students as high-risk under Annex III, point 3(b). This classification applies to AI tools
used in higher and further education. As a high-risk system, AI assessment tools must
comply with a range of obligations — including Article 14, which governs human oversight.
Article 14 sets out what "human oversight" means in legal terms for high-risk AI systems.
The system must be designed so that the person responsible for oversight can:
a
Understand the system's capabilities and limitations
Educators must have sufficient information to know when the AI's output can be trusted and
when it requires closer scrutiny. This requires transparency about how the model works and
what kinds of errors it tends to make.
b
Detect and address anomalies
The oversight workflow must make it reasonably easy to spot submissions where the AI grade
seems wrong — not just review a summary statistics dashboard, but actually interrogate
individual cases.
c
Remain aware of automation bias
Article 14 explicitly names the risk that humans may over-rely on AI output. A compliant
system must be designed to counter this tendency — not to encourage the human reviewer to
accept AI suggestions as defaults.
d
Correctly interpret the system's output
Grades and feedback produced by the AI must be explainable enough for the lecturer to
understand why a particular score was assigned — not just a black-box number.
e
Decide not to use the output
The lecturer must have the genuine practical ability to disregard the AI's output in any
particular situation — even to reject all AI suggestions for a batch if circumstances require it.
For UK institutions: The UK has implemented its own AI regulatory framework that, while
distinct from the EU AI Act, shares the principle that high-risk AI systems in sensitive contexts
require demonstrable human accountability. UK universities operating under OfS regulatory
requirements should treat EU AI Act compliance standards as a reasonable baseline.
Why "token oversight" fails the legal and ethical test
Legal scholars and regulators have already identified what they call "automation bias" as
the central challenge for human-in-the-loop AI requirements. Automation bias is the
documented tendency for humans to accept AI recommendations without critical scrutiny
— particularly when they are busy, when the AI system appears confident, and when the
cost of deviation from the AI's suggestion feels high.
Research published in 2025 by Fink in the Social Science Research Network on human
oversight under Article 14 of the EU AI Act makes the point directly: if a human reviewer
approves AI output in under a second across hundreds of submissions, regulators will treat
this as evidence that no meaningful oversight occurred. The theoretical right to override
the AI is not equivalent to the practical exercise of that right.
"If a human operator approves AI suggestions in one second across 200
submissions, this is not oversight — it is automation bias in action. Article 14
requires meaningful review, not a nominal one."
Fink, M. (2025). Human Oversight under Article 14 of the EU AI Act. SSRN.
This creates a practical design challenge for AI assessment platforms: the interface and
workflow must be designed to encourage genuine review, not to minimise the friction
between receiving AI output and forwarding it to students. This includes:
Presenting AI grades and feedback in a way that invites scrutiny rather than passive
acceptance.
Making it easy to drill down into individual submissions.
Designing the approval step as a deliberate action, not a passive default.
Providing lecturers with enough information about the AI's reasoning to assess its
reliability on a per-submission basis.
Article 14 compliance: two contrasting workflows
Non-compliant workflow
AI grades all submissions
Grades sent to students automatically
"Lecturer can object"
(burden on human to catch errors)
✗ Fails Article 14
AI output is the default; human is optional
Compliant workflow
AI grades all submissions
Grades held — lecturer reviews
Lecturer explicitly approves
Edits any scores or comments first
Grades released to students
✓ Satisfies Article 14
Left: the AI output is the default; the human is optional. Right: the human approval is required; the AI output is
a recommendation.
What does genuine human-in-the-loop oversight look like in practice?
In a well-designed AI assessment platform, the human review step is not an afterthought. It
is the point around which the entire workflow is organised. Here is what this looks like in
concrete operational terms.
The AI produces, the lecturer decides
The AI processes each submission against the rubric and returns a suggested score for
every criterion plus a written justification and feedback comment for the student. These are
suggestions, not grades. They sit in a review queue that only the lecturer can release. The
system is in a holding state until the lecturer acts.
The dashboard supports meaningful review
The lecturer sees a summary across the whole cohort: average scores per criterion, any
submissions flagged as outliers (unusually high or low), and a distribution view that makes
it easy to spot if the AI has been systematically lenient or strict. The lecturer can filter to
review only the outliers first, or work through the full batch submission by submission.
Every AI decision is explainable
For each submission, the lecturer can see not just the score but the reasoning: which
passages from the student's work the AI linked to which rubric criterion, and why a
particular score level was assigned. This is what Article 14(d) — the requirement to
correctly interpret the system's output — demands in practice. A black-box score with no
explanation does not satisfy this requirement.
Override is easy, not exceptional
The interface makes it as easy to change a score as to accept it. There is no friction
introduced to discourage adjustment. If a lecturer wants to revise every single AI
suggestion, the system supports that. The approval step requires a deliberate action —
clicking to release to students — rather than a passive default that sends grades
automatically unless interrupted.
Two grading modes, one underlying principle
One design choice that has significant implications for Article 14 compliance is whether the
AI's suggestion is shown to the lecturer before or after they have formed their own
assessment. Both approaches can be compliant, but they have different effects on
automation bias.
Mode
How it works
Automation bias risk
Best suited for
AI-
visible
mode
Lecturer sees AI scores
and feedback upfront;
reviews, adjusts, and
approves before release
Moderate — AI
suggestion is
visible but
approval is
required
High-volume cohorts where
full independent marking is
impractical; experienced
lecturers comfortable
calibrating against AI
Blind
mode
Lecturer marks
independently first; AI
score is revealed only after
the lecturer has recorded
their assessment
Low — AI cannot
anchor the
lecturer's
independent
judgment
High-stakes assessments;
institutions with strong
academic integrity concerns;
contexts where marking
consistency is being audited
Blind mode directly addresses the automation bias risk that Article 14(c) flags. Research on
anchoring effects in assessment shows that when evaluators see a suggested score before
forming their own judgment, that suggestion exerts a measurable influence — even when
evaluators believe they are making an independent decision. Blind mode structurally
prevents this by separating the independent judgment from the AI calibration step.
In Eduface: Institutions can configure which mode is used at course or assignment level. Blind
mode is the recommended setting for high-stakes summative assessments. AI-visible mode
suits formative feedback cycles where speed is the primary value and the AI's first-pass
suggestions are the main time-saving mechanism.
What should institutions look for when procuring AI assessment
tools?
The growing number of AI assessment tools on the market makes it difficult to distinguish
between platforms that genuinely implement human-in-the-loop as a workflow principle
and those that use the phrase as a marketing claim. When evaluating tools for
procurement, the following questions cut through the ambiguity.
✓
Are grades held in a review state until the lecturer explicitly approves them — or are they released
automatically unless the lecturer intervenes?
✓
Can the lecturer see the AI's reasoning for each score, not just the final number?
✓
Does the platform offer a blind mode where the AI's suggestion is withheld until after the lecturer
has made their own assessment?
✓
Is it as easy to override the AI's suggestion as to accept it, with no friction introduced to
discourage adjustment?
✓
Does the platform provide EU AI Act compliance documentation covering Articles 13, 14, and 11?
✓
Is student data processed in the EU, and does the vendor sign a GDPR Data Processing
Agreement?
✓
Can the platform provide a technical description of the model's known limitations and error
patterns?
✓
Is the tool listed on a recognised higher education procurement framework (Jisc/CHEST or
HEAnet)?
A vendor that cannot answer these questions with specific, documented responses should
be treated with caution, regardless of how the term "human-in-the-loop" appears on their
website.
Frequently asked questions
What does the EU AI Act require for AI assessment tools specifically?
The EU AI Act classifies AI systems that evaluate or assess students as high-risk under Annex
III, point 3(b). High-risk systems must comply with requirements covering technical
documentation (Article 11), transparency to users (Article 13), human oversight (Article 14), and
accuracy and robustness (Article 15). For assessment tools, Article 14 is the most operationally
significant: it requires that qualified human oversight is built into the deployment workflow, not
just available in principle. Grades cannot be treated as final until a competent human has
reviewed and approved them.
What is automation bias and why does it matter for AI assessment?
Automation bias is the well-documented tendency for humans to accept automated
recommendations without adequate critical scrutiny, particularly under time pressure. In
assessment contexts, this means lecturers may approve AI-suggested grades without genuinely
evaluating each one — which undermines the human oversight requirement and can result in
students receiving grades that reflect AI errors rather than their actual performance. Blind mode
and dashboard design that highlights outliers are practical countermeasures. Article 14(c) of the
EU AI Act explicitly requires AI systems to be designed to help users remain aware of this risk.
Does using AI assessment tools change who is responsible for the final grade?
No. Academic responsibility for grades remains with the lecturer and the institution, regardless
of whether AI assistance was used. This is explicit in EU AI Act Article 14 and in standard
academic regulations at virtually all UK and EU universities. The AI system is a tool that
produces recommendations; the lecturer takes accountability for every grade that reaches a
student. This is not a burden — it is the correct relationship between AI assistance and
professional academic judgment.
How should institutions communicate AI-assisted marking to students?
EU AI Act Article 13 requires transparency: users must be informed when they are interacting
with or receiving outputs from a high-risk AI system. For students, this means institutions should
disclose that AI assistance was used in producing feedback, while also explaining that all grades
were reviewed and approved by their lecturer. Research on student preferences shows that
framing AI feedback as 'AI-generated, reviewed and approved by your lecturer' produces
significantly better student engagement than simply labelling feedback as 'AI-generated.'
Is blind mode required for EU AI Act compliance?
Blind mode is not explicitly required by Article 14, but it is one of the most effective practical
responses to the automation bias requirement in Article 14(c). If an institution uses AI-visible
mode, it should be able to demonstrate through audit logs that lecturers are genuinely reviewing
and adjusting AI suggestions rather than accepting them automatically. Blind mode removes the
structural risk of anchoring bias entirely, which simplifies both compliance and governance.
Human oversight built in from the start
Eduface holds every grade until you explicitly approve it. Blind mode, explainable
scores, and EU AI Act compliance documentation are included as standard — not
optional extras.
Create free account
Book a demo
References
EU AI Act — Regulation (EU) 2024/1689 of the European Parliament and of the Council. Articles 11, 13, 14; Annex III
point 3(b).
Fink, M. (2025). Human Oversight under Article 14 of the EU AI Act. SSRN Working Paper. Available at:
ssrn.com/abstract=5147196
Flodén, J. (2025). Grading exams using large language models: A comparison between human and AI grading of
exams in higher education. British Educational Research Journal. doi:10.1002/berj.4069
Frontiers in Education (2025). Human-in-the-loop assessment with AI: implications for teacher education in Ibero-
American universities. doi:10.3389/feduc.2025.1710992
Hattie, J. & Timperley, H. (2007). The Power of Feedback. Review of Educational Research, 77(1), 81-112.
RegulaAI (2025). Human Oversight in AI: What 'Human-in-the-Loop' Actually Means Under EU Law. regula-ai.com
Compliance & Ethics
Human-in-the-Loop AI: What It
Means in Practice for Automated
Assessment
What Article 14 of the EU AI Act actually requires, and
how to tell whether an AI assessment platform
genuinely implements it or just claims to.
Eduface
·
9 min read
·
June 2026
Every major AI assessment platform now advertises
"human-in-the-loop" as a feature. But what it actually
means in practice varies enormously — from a
lecturer who genuinely reviews every grade before it
reaches a student, to a nominal confirmation click
that provides no meaningful oversight at all.
What does human-in-the-loop mean?
Human-in-the-loop AI assessment means a
qualified lecturer reviews, can adjust, and
explicitly approves AI-generated grades and
feedback before students receive them. Under
EU AI Act Article 14, this oversight must be
genuine — not a rubber-stamp — but a
meaningful review where the educator
understands the AI's recommendations and
takes accountability for the final grade.
What does "human-in-the-loop"
actually mean?
The term comes from engineering, where it describes
systems that incorporate human judgment at critical
decision points. Applied to AI assessment, it means
the AI produces a suggested grade and feedback,
and a lecturer reviews that suggestion before it
becomes the final output that reaches the student.
At minimum, this requires:
The lecturer can see every AI-suggested score
and comment before release.
The lecturer can edit, override, or reject any AI
output.
Student-facing grades are not released until the
lecturer explicitly approves them.
The lecturer takes academic and legal
responsibility for the final grade.
What it does not mean is that the AI output is shown
to lecturers and then automatically sent to students
unless the lecturer actively objects. That workflow
inverts the oversight relationship.
What the EU AI Act specifically requires
The EU AI Act (Regulation 2024/1689) classifies AI
systems used to evaluate, assess, and direct students
as high-risk under Annex III, point 3(b). Article 14 sets
out what "human oversight" means in legal terms.
The system must be designed so that the responsible
person can:
a
Understand capabilities and limitations
Educators must have sufficient information to know
when the AI's output can be trusted and when it
requires closer scrutiny.
b
Detect and address anomalies
The oversight workflow must make it easy to spot
individual submissions where the AI grade seems
wrong — not just review a summary dashboard.
c
Remain aware of automation bias
Article 14 explicitly names the risk that humans
may over-rely on AI output. A compliant system
must be designed to counter this tendency.
d
Correctly interpret the output
Grades and feedback must be explainable enough
for the lecturer to understand why a particular
score was assigned.
e
Decide not to use the output
The lecturer must have the genuine practical ability
to disregard the AI's output — including rejecting
an entire batch.
For UK institutions: The UK's own AI regulatory
framework shares the principle that high-risk AI in
sensitive contexts requires demonstrable human
accountability. UK universities should treat EU AI
Act Article 14 standards as a reasonable baseline.
Why "token oversight" fails the legal
test
Legal scholars and regulators have already identified
"automation bias" as the central challenge for human-
in-the-loop AI requirements. Automation bias is the
documented tendency for humans to accept AI
recommendations without critical scrutiny —
particularly when busy, when the AI system appears
confident, and when the cost of deviation feels high.
"If a human operator approves AI suggestions
in one second across 200 submissions, this is
not oversight — it is automation bias in action.
Article 14 requires meaningful review, not a
nominal one."
Fink, M. (2025). Human Oversight under Article 14 of
the EU AI Act. SSRN.
This creates a practical design challenge: the
interface must be designed to encourage genuine
review, not to minimise the friction between receiving
AI output and forwarding it to students. A single
"approve all" button that releases all AI-generated
grades in one click after a summary screen fails the
Article 14 standard.
What genuine HITL oversight looks like
in practice
The AI produces, the lecturer decides
The AI processes each submission against the rubric
and returns a suggested score for every criterion plus
a written feedback comment. These are suggestions,
not grades. They sit in a review queue that only the
lecturer can release.
The dashboard supports meaningful review
The lecturer sees a summary across the whole
cohort: average scores per criterion, any submissions
flagged as outliers (unusually high or low), and a
distribution view. The lecturer can filter to review only
the outliers first, or work through the full batch
submission by submission.
Every AI decision is explainable
For each submission, the lecturer can see not just the
score but the reasoning: which passages from the
student's work the AI linked to which rubric criterion,
and why a particular score level was assigned. This is
what Article 14(d) demands in practice. A black-box
score with no explanation does not satisfy this
requirement.
Override is easy, not exceptional
The interface makes it as easy to change a score as
to accept it. There is no friction introduced to
discourage adjustment. The approval step requires a
deliberate action — not a passive default that sends
grades automatically unless interrupted.
Two grading modes, one underlying
principle
Whether the AI's suggestion is shown to the lecturer
before or after they have formed their own
assessment has significant implications for Article 14
compliance.
AI-visible mode
Lecturer sees AI scores upfront; reviews, adjusts,
and approves before release. Automation bias risk:
Moderate.
Best for: high-volume cohorts where full independent
marking is impractical.
Blind mode
Lecturer marks independently first; AI score is
revealed only after the lecturer has recorded their
own assessment. Automation bias risk: Low.
Best for: high-stakes summative assessments and
institutions where marking consistency is being audited.
In Eduface: Institutions can configure which mode
is used at course or assignment level. Blind mode is
the recommended setting for high-stakes
summative assessments. AI-visible mode suits
formative feedback cycles where speed is the
primary value.
Procurement checklist
When evaluating AI assessment platforms, ask:
Are grades held in a review state until the lecturer
explicitly approves them — or released automatically
unless the lecturer intervenes?
Can the lecturer see the AI's reasoning for each
score, not just the final number?
Does the platform offer a blind mode where the AI's
suggestion is withheld until after the lecturer has
made their own assessment?
Is it as easy to override the AI's suggestion as to
accept it, with no friction to discourage adjustment?
Does the platform provide EU AI Act compliance
documentation covering Articles 11, 13, and 14?
Is student data processed in the EU, and does the
vendor sign a GDPR Data Processing Agreement?
Can the platform provide a technical description of
the model's known limitations and error patterns?
Is the tool listed on a recognised higher education
procurement framework (Jisc/CHEST or HEAnet)?
A vendor that cannot answer these questions with
specific, documented responses should be treated
with caution, regardless of how the term "human-in-
the-loop" appears on their website.
Frequently asked questions
What does the EU AI Act require for AI assessment
tools specifically?
The EU AI Act classifies AI systems that evaluate or
assess students as high-risk under Annex III, point
3(b). High-risk systems must comply with
requirements covering technical documentation
(Article 11), transparency to users (Article 13), human
oversight (Article 14), and accuracy and robustness
(Article 15). For assessment tools, Article 14 is the
most operationally significant: grades cannot be
treated as final until a competent human has
reviewed and approved them.
What is automation bias and why does it matter for
AI assessment?
Automation bias is the well-documented tendency
for humans to accept automated recommendations
without adequate critical scrutiny, particularly under
time pressure. In assessment contexts, this means
lecturers may approve AI-suggested grades without
genuinely evaluating each one — which undermines
the human oversight requirement. Article 14(c) of the
EU AI Act explicitly requires AI systems to be
designed to help users remain aware of this risk.
Does a 'confirm to send' button satisfy Article 14?
No — not if it functions as a single-click approval of
all AI-generated grades without meaningful
individual review. Regulators require that the human
has genuinely evaluated the AI's output, not merely
acknowledged it. An interface designed to minimise
the time spent in review does not satisfy the spirit of
Article 14, even if it technically includes a human
approval step.
What is blind mode and why does it matter?
Blind mode is a grading workflow where the lecturer
records their own assessment of a submission
before seeing the AI's suggested grade. The AI score
is revealed only after the lecturer has committed to
their independent judgment. This structurally
prevents the anchoring effect — where seeing the
AI's number first exerts a measurable influence on
the lecturer's assessment, even when the lecturer
believes they are marking independently.
Is Eduface compliant with the EU AI Act?
Yes. Eduface is built around mandatory lecturer sign-
off before any grade is released to students, which
directly addresses Article 14. Every AI-generated
grade sits in a review queue until the lecturer
explicitly approves it. The AI's reasoning is explained
per submission. Both AI-visible and blind modes are
available. Eduface processes all student data on EU-
hosted proprietary infrastructure and publishes EU AI
Act compliance documentation.
Human-in-the-loop by design
Eduface's oversight workflow is built to meet
Article 14 — not retrofitted for compliance.
Create free account
Or book a demo for institutional rollout.