FEEDBACK VS GRADING

Why AI Feedback Is Not the Same as AI Grading

Most institutions conflate the two. They are different functions, and treating them as the same leads to poor procurement decisions and disappointed expectations.

By Eduface · June 2026 · 8 min read

When institutions talk about “AI grading”, they often mean quite different things. Some mean automated scoring. Some mean written comments generated by a model. Some mean both. This ambiguity is not just semantic: it shapes what tools get procured, how students benefit, and whether AI assessment actually moves the metrics that matter.

What is the difference between AI feedback and AI grading?

AI grading is the automated assignment of a score to a student submission, applying rubric criteria without human intervention. AI feedback is written explanation of why a grade was assigned: what the student did well, what fell short, and how to improve. Many AI tools do one but not the other. Institutions that assume “AI grading” includes substantive written feedback often discover they have bought a scoring engine, not a feedback system.

What does a grade actually tell a student?

A grade tells a student where they finished. It does not tell them how to start differently next time.

Sadler (1989) made the foundational argument that marks alone are insufficient for learning: students need to understand what quality looks like, recognise the gap between their current work and that standard, and know how to close it. A numerical score communicates the outcome of an assessment. It does not provide the information needed to change behaviour.

This is not a criticism of grades. Summative assessment has a legitimate function: it records and certifies performance. But that is a different function from formative feedback, which is designed to influence future work. Treating a grade as feedback is like treating a verdict as legal advice.

The practical consequence: students who receive only a score learn what they scored. Students who receive per-criterion written comments learn what they need to do differently. These are not equivalent outcomes.

What makes feedback different from a mark?

Hattie and Timperley (2007) identified feedback as one of the most powerful influences on student achievement, with an average effect size of d=0.73. Their definition is specific: feedback is information about the gap between a student’s current performance and a goal. A grade communicates the size of that gap. It does not explain it.

Nicol and Macfarlane-Dick (2006) extended this into seven principles of good feedback practice. Effective feedback must be actionable and forward-looking: it should clarify what good performance looks like, support self-assessment, deliver high-quality information about learning, encourage dialogue, build motivation, close the gap between current and desired performance, and inform teaching. A percentage score meets none of these criteria on its own.

The implication for AI tools is direct. An AI system that produces a grade performs a useful administrative function. An AI system that produces per-criterion written comments, tailored to a specific submission, performs a pedagogical one. These are different tasks, and they require different capabilities.

Why does the distinction matter for NSS scores?

The National Student Survey’s Assessment and Feedback questions are among the most closely watched metrics in UK higher education. What the data consistently shows is that student dissatisfaction here is not primarily about grade accuracy. Students are not protesting that their 58% should have been a 62%. They are protesting that they do not know why they received that grade, and they do not know how to perform better next time.

This means deploying an AI tool that produces faster grades without improving the quality of written feedback is unlikely to move NSS scores. The metric responds to feedback quality and timeliness, not to scoring speed. If an institution is procuring AI assessment technology with NSS improvement as a goal, the presence or absence of written feedback generation is not a feature consideration. It is a threshold requirement.

What should institutions look for when evaluating AI assessment tools?

The table below shows the difference in what students and institutions actually receive from each type of tool.

Dimension

AI grading only

AI feedback + grading (Eduface)

Output

Score or grade

Score plus per-criterion written comments

Student benefit

Knows what they scored

Knows what to do differently next time

NSS impact

Limited

Direct improvement potential

Formative value

Minimal

High

Lecturer review

Grade confirmation

Grade and feedback review

Tailored to submission

Not typically

Yes, individualised per student

When evaluating AI assessment tools, institutions should ask:

Does the tool produce written feedback, or only a score?
Is the feedback per-criterion, or a single summary comment?
Is the feedback tailored to the individual submission, or generated from a template?
Does a human review both the grade and the feedback before release?
Is the tool on an approved procurement framework?

The answers determine whether a tool is likely to improve student outcomes or simply reduce administrative processing time for grades, which is a narrower benefit.

How does Eduface handle both grading and feedback?

Eduface was designed from the outset to address both functions: automated scoring and written formative feedback. For every student submission, Eduface generates a per-criterion score and a set of written comments specific to that student’s work. The comments are not template text: they are generated in response to what the submission actually contains.

The process includes a human-in-the-loop at every stage. Lecturers review and approve both the grade and the written feedback before anything is released to students. This preserves academic judgement and ensures the AI output is a starting point for the lecturer, not a final determination.

Eduface operates in two modes. In blind mode, the AI produces its grade and comments first, and these stay hidden from the lecturer until they have completed their own assessment. In AI-visible mode, the AI draft is shown upfront, and the lecturer reviews, amends, and approves before release. Both modes keep the human review step.

In UK pilots, Eduface has achieved 95% alignment with lecturer assessments. Pilot partners include Bath Spa University, De Haagse Hogeschool, and Tilburg University. The platform is approved on the Jisc/CHEST framework, which simplifies procurement for UK institutions.

Frequently asked questions

What is the difference between AI grading and AI feedback?

AI grading is automated scoring: a system applies rubric criteria and produces a numerical or letter grade. AI feedback is written explanation: what was done well, what fell short, and how to improve. Many tools do one but not the other. Students need both to understand their performance and change their behaviour.

Can AI write personalised feedback on student essays?

Yes, but quality varies between tools. Effective AI feedback should be tailored to the specific content of each submission, not drawn from a generic template. Eduface generates per-criterion comments that respond to what is actually in each student’s work, with 95% alignment to independent lecturer assessments in UK pilots.

Does AI feedback replace lecturer comments?

Not in any well-designed implementation. In Eduface, AI-generated feedback is a draft that lecturers review and approve before it reaches students. The AI performs the initial processing; the lecturer keeps the final decision on both the grade and the comments.

Which NSS questions does AI feedback most directly affect?

The NSS Assessment and Feedback questions cover whether students received detailed feedback, whether it helped them understand how to improve, and whether feedback was timely. These respond to the quality of written comments, not grade accuracy. Tools that generate only scores are unlikely to affect them.

How does Eduface produce written feedback for each student?

Eduface analyses each submission against the rubric criteria defined by the lecturer. For each criterion, it generates a comment that reflects what that submission demonstrated. The output is submission-specific rather than template-based, and the lecturer reviews both scores and comments before release.

Summary

Grading and feedback are not the same thing, and AI tools that conflate them will not deliver what institutions expect. If the goal is to improve student outcomes and NSS scores, the relevant question is not just whether a tool can produce a grade automatically. It is whether it can produce written, individualised, per-criterion feedback that a student can act on. That is a higher bar, and it is the right one to set.

References

1. Sadler, D.R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119 to 144.

2. Hattie, J. and Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81 to 112.

3. Nicol, D. and Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: a model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199 to 218.

See how Eduface handles both grading and feedback

Request a demo, or create a free lecturer account and run it against your own rubric.

Request a demo

Create free account