Du, Yuyang;
Chen, Kexin;
Zhan, Yue;
Low, Chang H;
Islam, Mobarakol;
Guo, Ziyu;
Jin, Yueming;
... Heng, Pheng Ann; + view all
(2025)
LMT++: Adaptively Collaborating LLMs with Multi-specialized Teachers for Continual VQA in Robotic Surgical Videos.
IEEE Transactions on Medical Imaging
10.1109/TMI.2025.3581108.
(In press).
Preview |
Text
LMT__.pdf - Accepted Version Download (1MB) | Preview |
Abstract
Visual question answering (VQA) plays a vital role in advancing surgical education. However, due to the privacy concern of patient data, training VQA model with previously used data becomes restricted, making it necessary to use the exemplar-free continual learning (CL) approach. Previous CL studies in the surgical field neglected two critical issues: i) significant domain shifts caused by the wide range of surgical procedures collected from various sources, and ii) the data imbalance problem caused by the unequal occurrence of medical instruments or surgical procedures. This paper addresses these challenges with a multimodal large language model (LLM) and an adaptive weight assignment strategy. First, we developed a novel LLM-assisted multi-teacher CL framework (named LMT++), which could harness the strength of a multimodal LLM as a supplementary teacher. The LLM’s strong generalization ability, as well as its good understanding of the surgical domain, help to address the knowledge gap arising from domain shifts and data imbalances. To incorporate the LLM in our CL framework, we further proposed an innovative approach to process the training data, which involves the conversion of complex LLM embeddings into logits value used within our CL training framework. Moreover, we design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of conventional VQA models obtained in previous model training processes within the CL framework. Finally, we created a new surgical VQA dataset for model evaluation. Comprehensive experimental findings on these datasets show that our approach surpasses state-of-the-art CL methods.
Type: | Article |
---|---|
Title: | LMT++: Adaptively Collaborating LLMs with Multi-specialized Teachers for Continual VQA in Robotic Surgical Videos |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1109/TMI.2025.3581108 |
Publisher version: | https://doi.org/10.1109/TMI.2025.3581108 |
Language: | English |
Additional information: | This version is the author-accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | Training , Surgery , Data models , Adaptation models , Biomedical imaging , Electronic mail , Continuing education , Visualization , Large language models , Data privacy |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Med Phys and Biomedical Eng |
URI: | https://discovery.ucl.ac.uk/id/eprint/10211482 |
Archive Staff Only
![]() |
View Item |