Automating Pedagogical Evaluation of LLM-based Conversational Agents

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Automating Pedagogical Evaluation of LLM-based Conversational Agents

Pauzi, Z; Dodman, M; Mavrikis, M; (2025) Automating Pedagogical Evaluation of LLM-based Conversational Agents. In: Proceedings of the Second Workshop on Automated Evaluation of Learning and Assessment Content co-located with 26th International Conference on Artificial Intelligence in Education (AIED 2025). CEUR: Palermo, Italy. Green open access

[thumbnail of _CR_version__Automating_pedagogical_evaluation_of_AI_tutors_employing_Socratic_dialogue__EVAL_LAC_2025_.pdf]

Preview

Text
_CR_version__Automating_pedagogical_evaluation_of_AI_tutors_employing_Socratic_dialogue__EVAL_LAC_2025_.pdf - Published Version
Download (205kB) | Preview

Abstract

With the growing adoption of large language models (LLMs) in educational settings, there is an urgent need for systematic and scalable evaluation methods. Traditional natural language generation metrics such as BLEU, ROUGE and METEOR excel at measuring surface‐level linguistic quality but fall short in evaluating the interactive, adaptive nature of dialogue alignment of conversational agents, particularly in relation to their intended design. To address these gaps, we propose an evaluation strategy that extends beyond technical evaluation (linguistic coherence and semantic relevance). In this pilot study we compare human and LLM-based evaluation of a conversational agent, with a focus on Socratic dialogue as a specific instantiation. Early results indicate that our LLM-as-a-Judge aligns closely to human evaluators for clear, surface‐level qualities like encouragement and actionable guidance, but less on subtle pedagogical behaviours such as recognising errors and maintaining natural dialogue flow. These early results underscore the promise of LLM-based evaluators for scalable assessment of tutoring behaviours while highlighting the need for targeted fine-tuning and hybrid approaches to improve nuanced error detection and dialogue coherence.

Type:	Proceedings paper
Title:	Automating Pedagogical Evaluation of LLM-based Conversational Agents
Event:	26th International Conference on Artificial Intelligence in Education (AIED 2025)
Open access status:	An open access version is available from UCL Discovery
Publisher version:	https://ceur-ws.org/Vol-4006/
Language:	English
Additional information:	© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Keywords:	Pedagogical rubric, automated evaluation, Socratic dialogue, AI tutor
UCL classification:	UCL UCL > Provost and Vice Provost Offices > School of Education UCL > Provost and Vice Provost Offices > School of Education > UCL Institute of Education UCL > Provost and Vice Provost Offices > School of Education > UCL Institute of Education > IOE - Culture, Communication and Media
URI:	https://discovery.ucl.ac.uk/id/eprint/10212920

Downloads since deposit

154Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item