eprintid: 10189942
rev_number: 7
eprint_status: archive
userid: 699
dir: disk0/10/18/99/42
datestamp: 2024-04-05 13:27:27
lastmod: 2024-04-05 13:27:27
status_changed: 2024-04-05 13:27:27
type: article
metadata_visibility: show
sword_depositor: 699
creators_name: Cui, B
creators_name: Islam, M
creators_name: Bai, L
creators_name: Ren, H
title: Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery
ispublished: inpress
divisions: UCL
divisions: B04
divisions: C05
divisions: F42
keywords: Adapter learning, Depth estimation, Foundation models, Surgical scene understanding
note: © 2024 Springer Nature. This article is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
abstract: PURPOSE: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. METHODS: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. RESULTS: Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. CONCLUSION: Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly.
date: 2024
date_type: published
publisher: Springer Science and Business Media LLC
official_url: http://dx.doi.org/10.1007/s11548-024-03083-5
oa_status: green
full_text_type: pub
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 2256268
doi: 10.1007/s11548-024-03083-5
medium: Print-Electronic
pii: 10.1007/s11548-024-03083-5
lyricists_name: Islam, Mobarakol
lyricists_id: MISLB53
actors_name: Flynn, Bernadette
actors_id: BFFLY94
actors_role: owner
funding_acknowledgements: C4026-21G, GRF 14211420 14203323 [Hong Kong Research Grants Council (RGC) Collaborative Research Fund and General Research Fund]; 202108233000303 [Shenzhen-Hong Kong-Macau Technology Research Programme (Type C) STIC Grant SGDX20210823103535014]
full_text_status: public
publication: International Journal of Computer Assisted Radiology and Surgery
event_location: Germany
citation:        Cui, B;    Islam, M;    Bai, L;    Ren, H;      (2024)    Surgical-DINO: adapter learning of foundation models for depth estimation in endoscopic surgery.                   International Journal of Computer Assisted Radiology and Surgery        10.1007/s11548-024-03083-5 <https://doi.org/10.1007/s11548-024-03083-5>.    (In press).    Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10189942/1/s11548-024-03083-5.pdf