Towards Interpretable and Human-Aligned Artificial Intelligence Systems

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Towards Interpretable and Human-Aligned Artificial Intelligence Systems

Xu, Binxia; (2025) Towards Interpretable and Human-Aligned Artificial Intelligence Systems. Doctoral thesis (Ph.D), UCL (University College London). Green open access

Preview

Text
Xu_10212037_thesis.pdf
Download (63MB) | Preview

Abstract

Despite the remarkable advancements of deep learning-based AI algorithms across various domains and tasks, their opacity, lack of robustness, and unreliability restrict their deployment in critical domains such as healthcare and autonomous driving. This work introduces a novel approach to developing interpretable models specifically tailored for video-based Human Activity Recognition (HAR) tasks, aimed at improving the transparency of the decision-making process. Furthermore, this thesis innovates by introducing measures for behavioural alignment between decision-making systems through the analysis of their error patterns, providing deep insights for further exploration of more trustworthy and human-aligned AI systems. Video-based HAR plays a crucial role in the ecosystem of ambient intelligent systems, with significant applications in areas like public security surveillance or autonomous vehicles. Despite numerous methods proposed in the literature, challenges remain in improving both the performance and interpretability of these systems. This study proposes a graph attention network-based framework that allows flexible injection of contextual information alongside the human poses to address these limitations. Furthermore, this framework serves as an evaluation tool for investigating the importance of contextual information, a feature that has not been fully explored in previous research. Beyond building more transparent and interpretable models, this work establishes novel metrics to evaluate the reliability of AI systems. Existing evaluation metrics predominantly focus on prediction accuracy, overlooking other crucial aspects that may impact the performance and trustworthiness of a model. To address this gap, this work introduces a set of novel metrics, including Semantic Prediction Dispersion (SPD), Misclassification Agreement (MA), and Class-Level Error Similarity (CLES), which can be used to assess the alignment of prediction behaviours based on error patterns between different agents.

Type:	Thesis (Doctoral)
Qualification:	Ph.D
Title:	Towards Interpretable and Human-Aligned Artificial Intelligence Systems
Open access status:	An open access version is available from UCL Discovery
Language:	English
Additional information:	Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL SLASH UCL > Provost and Vice Provost Offices > UCL SLASH > Faculty of Arts and Humanities
URI:	https://discovery.ucl.ac.uk/id/eprint/10212037

Downloads since deposit

19Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item