UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Rethinking Feature Extraction: Gradient-based Localized Feature Extraction for End-to-End Surgical Downstream Tasks

Pang, Winnie; Islam, Mobarakol; Mitheran, Sai; Seenivasan, Lalithkumar; Xu, Mengya; Ren, Hongliang; (2022) Rethinking Feature Extraction: Gradient-based Localized Feature Extraction for End-to-End Surgical Downstream Tasks. IEEE Robotics and Automation Letters , 7 (4) 12623 -12630. 10.1109/lra.2022.3221310. Green open access

[thumbnail of Rethinking_Feature_Extraction_Gradient-based_Localized_Feature_Extraction_for_End-to-End_Surgical_Downstream_Tasks.pdf]
Preview
Text
Rethinking_Feature_Extraction_Gradient-based_Localized_Feature_Extraction_for_End-to-End_Surgical_Downstream_Tasks.pdf - Accepted Version

Download (768kB) | Preview

Abstract

Several approaches have been introduced to understand surgical scenes through downstream tasks like captioning and surgical scene graph generation. However, most of them heavily rely on an independent object detector and region-based feature extractor. Encompassing computationally expensive detection and feature extraction models, these multi-stage methods suffer from slow inference speed, making them less suitable for real-time surgical applications. The performance of the downstream tasks also degrades from inheriting errors of the earlier modules of the pipeline. This work develops a detector-free gradient-based localized feature extraction approach that enables end-to-end model training for downstream surgical tasks such as report generation and tool-tissue interaction graph prediction. We eliminate the need for object detection or region proposal and feature extraction networks by extracting the features of interest from the discriminative regions in the feature map of the classification models. Here, the discriminative regions are localized using gradient-based localization techniques (e.g. Grad-CAM). We show that our proposed approaches enable the real-time deployment of end-to-end models for surgical downstream tasks. We extensively validate our approach on two surgical tasks: captioning and scene graph generation. The results prove that our gradient-based localized feature extraction methods effectively substitute the detector and feature extractor networks, allowing end-to-end model development with faster inference speed, essential for real-time surgical scene understanding tasks. The code is publicly available at https://github.com/PangWinnie0219/GradCAMDownstreamTask.

Type: Article
Title: Rethinking Feature Extraction: Gradient-based Localized Feature Extraction for End-to-End Surgical Downstream Tasks
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/lra.2022.3221310
Publisher version: https://doi.org/10.1109/lra.2022.3221310
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Semantic Scene Understanding, Computer Vision for Medical Robotics, Medical Robots and Systems
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Med Phys and Biomedical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10159683
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item