UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery

Cui, B; Islam, M; Bai, L; Ren, H; (2024) Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery. International Journal of Computer Assisted Radiology and Surgery 10.1007/s11548-024-03083-5. (In press). Green open access

[thumbnail of s11548-024-03083-5.pdf]
Preview
PDF
s11548-024-03083-5.pdf - Published Version

Download (551kB) | Preview

Abstract

PURPOSE: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. METHODS: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. RESULTS: Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. CONCLUSION: Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly.

Type: Article
Title: Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery
Location: Germany
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s11548-024-03083-5
Publisher version: http://dx.doi.org/10.1007/s11548-024-03083-5
Language: English
Additional information: © 2024 Springer Nature. This article is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Keywords: Adapter learning, Depth estimation, Foundation models, Surgical scene understanding
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Med Phys and Biomedical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10189942
Downloads since deposit
Loading...
16Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item