LMT++: Adaptively Collaborating LLMs with Multi-specialized Teachers for Continual VQA in Robotic Surgical Videos

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

LMT++: Adaptively Collaborating LLMs with Multi-specialized Teachers for Continual VQA in Robotic Surgical Videos

Du, Yuyang; Chen, Kexin; Zhan, Yue; Low, Chang H; Islam, Mobarakol; Guo, Ziyu; Jin, Yueming; ... Heng, Pheng Ann; + view all (2025) LMT++: Adaptively Collaborating LLMs with Multi-specialized Teachers for Continual VQA in Robotic Surgical Videos. IEEE Transactions on Medical Imaging 10.1109/TMI.2025.3581108. (In press). Green open access

Preview

Text
LMT__.pdf - Accepted Version
Download (1MB) | Preview

Abstract

Visual question answering (VQA) plays a vital role in advancing surgical education. However, due to the privacy concern of patient data, training VQA model with previously used data becomes restricted, making it necessary to use the exemplar-free continual learning (CL) approach. Previous CL studies in the surgical field neglected two critical issues: i) significant domain shifts caused by the wide range of surgical procedures collected from various sources, and ii) the data imbalance problem caused by the unequal occurrence of medical instruments or surgical procedures. This paper addresses these challenges with a multimodal large language model (LLM) and an adaptive weight assignment strategy. First, we developed a novel LLM-assisted multi-teacher CL framework (named LMT++), which could harness the strength of a multimodal LLM as a supplementary teacher. The LLM’s strong generalization ability, as well as its good understanding of the surgical domain, help to address the knowledge gap arising from domain shifts and data imbalances. To incorporate the LLM in our CL framework, we further proposed an innovative approach to process the training data, which involves the conversion of complex LLM embeddings into logits value used within our CL training framework. Moreover, we design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of conventional VQA models obtained in previous model training processes within the CL framework. Finally, we created a new surgical VQA dataset for model evaluation. Comprehensive experimental findings on these datasets show that our approach surpasses state-of-the-art CL methods.

Type:	Article
Title:	LMT++: Adaptively Collaborating LLMs with Multi-specialized Teachers for Continual VQA in Robotic Surgical Videos
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1109/TMI.2025.3581108
Publisher version:	https://doi.org/10.1109/TMI.2025.3581108
Language:	English
Additional information:	This version is the author-accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Training , Surgery , Data models , Adaptation models , Biomedical imaging , Electronic mail , Continuing education , Visualization , Large language models , Data privacy
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Med Phys and Biomedical Eng
URI:	https://discovery.ucl.ac.uk/id/eprint/10211482

Downloads since deposit

74Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item