UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos

Shao, Z; Xu, J; Stoyanov, D; Mazomenos, EB; Jin, Y; (2024) Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos. IEEE Robotics and Automation Letters , 9 (12) pp. 11513-11520. 10.1109/LRA.2024.3495452. Green open access

[thumbnail of RAL_COG.pdf]
Preview
Text
RAL_COG.pdf - Accepted Version

Download (2MB) | Preview

Abstract

Despite advancements in robotic systems and surgical data science, ensuring safe execution in robot-assisted minimally invasive surgery (RMIS) remains challenging. Current methods for surgical error detection typically involve two parts: identifying gestures and then detecting errors within each gesture clip. These methods often overlook the rich contextual and semantic information inherent in surgical videos, with limited performance due to reliance on accurate gesture identification. Inspired by the chain-of-thought prompting in natural language processing, this letter presents a novel and real-time end-to-end error detection framework, Chain-of-Gesture (COG) prompting, integrating contextual information from surgical videos step by step. This encompasses two reasoning modules that simulate expert surgeons' decision-making: a Gestural-Visual Reasoning module using transformer and attention architectures for gesture prompting and a Multi-Scale Temporal Reasoning module employing a multi-stage temporal convolutional network with slow and fast paths for temporal information extraction. We validate our method on the JIGSAWS dataset and show improvements over the state-of-the-art, achieving 4.6% higher F1 score, 4.6% higher Accuracy, and 5.9% higher Jaccard index, with an average frame processing time of 6.69 milliseconds. This demonstrates our approach's potential to enhance RMIS safety and surgical education efficacy. The code will be available.

Type: Article
Title: Think Step by Step: Chain-of-Gesture Prompting for Error Detection in Robotic Surgical Videos
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/LRA.2024.3495452
Publisher version: http://dx.doi.org/10.1109/lra.2024.3495452
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Videos, Surgery, Cognition, Robots, Real-time systems, Transformers, Visualization, Kinematics, Training, Semantics
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Med Phys and Biomedical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10200886
Downloads since deposit
Loading...
16Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item