Pauzi, Z;
Dodman, M;
Mavrikis, M;
(2025)
Automating Pedagogical Evaluation of LLM-based Conversational Agents.
In:
Proceedings of the Second Workshop on Automated Evaluation of Learning and Assessment Content co-located with 26th International Conference on Artificial Intelligence in Education (AIED 2025).
CEUR: Palermo, Italy.
Preview |
Text
_CR_version__Automating_pedagogical_evaluation_of_AI_tutors_employing_Socratic_dialogue__EVAL_LAC_2025_.pdf - Published Version Download (205kB) | Preview |
Abstract
With the growing adoption of large language models (LLMs) in educational settings, there is an urgent need for systematic and scalable evaluation methods. Traditional natural language generation metrics such as BLEU, ROUGE and METEOR excel at measuring surface‐level linguistic quality but fall short in evaluating the interactive, adaptive nature of dialogue alignment of conversational agents, particularly in relation to their intended design. To address these gaps, we propose an evaluation strategy that extends beyond technical evaluation (linguistic coherence and semantic relevance). In this pilot study we compare human and LLM-based evaluation of a conversational agent, with a focus on Socratic dialogue as a specific instantiation. Early results indicate that our LLM-as-a-Judge aligns closely to human evaluators for clear, surface‐level qualities like encouragement and actionable guidance, but less on subtle pedagogical behaviours such as recognising errors and maintaining natural dialogue flow. These early results underscore the promise of LLM-based evaluators for scalable assessment of tutoring behaviours while highlighting the need for targeted fine-tuning and hybrid approaches to improve nuanced error detection and dialogue coherence.
Type: | Proceedings paper |
---|---|
Title: | Automating Pedagogical Evaluation of LLM-based Conversational Agents |
Event: | 26th International Conference on Artificial Intelligence in Education (AIED 2025) |
Open access status: | An open access version is available from UCL Discovery |
Publisher version: | https://ceur-ws.org/Vol-4006/ |
Language: | English |
Additional information: | © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). |
Keywords: | Pedagogical rubric, automated evaluation, Socratic dialogue, AI tutor |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Education UCL > Provost and Vice Provost Offices > School of Education > UCL Institute of Education UCL > Provost and Vice Provost Offices > School of Education > UCL Institute of Education > IOE - Culture, Communication and Media |
URI: | https://discovery.ucl.ac.uk/id/eprint/10212920 |
Archive Staff Only
![]() |
View Item |