UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis

Jin, Yueming; Long, Yonghao; Gao, Xiaojie; Stoyanov, Danail; Dou, Qi; Pheng-Ann, Heng; (2022) Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis. International Journal of Computer Assisted Radiology and Surgery , 17 (12) pp. 2193-2202. 10.1007/s11548-022-02743-8. Green open access

[thumbnail of IJCARS22 - Yueming Jin.pdf]
Preview
Text
IJCARS22 - Yueming Jin.pdf - Accepted Version

Download (2MB) | Preview

Abstract

Purpose: Real-time surgical workflow analysis has been a key component for computer-assisted intervention system to improve cognitive assistance. Most existing methods solely rely on conventional temporal models and encode features with a successive spatial–temporal arrangement. Supportive benefits of intermediate features are partially lost from both visual and temporal aspects. In this paper, we rethink feature encoding to attend and preserve the critical information for accurate workflow recognition and anticipation. Methods: We introduce Transformer in surgical workflow analysis, to reconsider complementary effects of spatial and temporal representations. We propose a hybrid embedding aggregation Transformer, named Trans-SVNet, to effectively interact with the designed spatial and temporal embeddings, by employing spatial embedding to query temporal embedding sequence. We jointly optimized by loss objectives from both analysis tasks to leverage their high correlation. Results: We extensively evaluate our method on three large surgical video datasets. Our method consistently outperforms the state-of-the-arts across three datasets on workflow recognition task. Jointly learning with anticipation, recognition results can gain a large improvement. Our approach also shows its effectiveness on anticipation with promising performance achieved. Our model achieves a real-time inference speed of 0.0134 second per frame. Conclusion: Experimental results demonstrate the efficacy of our hybrid embeddings integration by rediscovering the crucial cues from complementary spatial–temporal embeddings. The better performance by multi-task learning indicates that anticipation task brings the additional knowledge to recognition task. Promising effectiveness and efficiency of our method also show its promising potential to be used in operating room.

Type: Article
Title: Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis
Location: Germany
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s11548-022-02743-8
Publisher version: https://doi.org/10.1007/s11548-022-02743-8
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Science & Technology, Technology, Life Sciences & Biomedicine, Engineering, Biomedical, Radiology, Nuclear Medicine & Medical Imaging, Surgery, Engineering, Surgical vision, Workflow recognition, Workflow anticipation, Transformer, Spatial-temporal feature modeling, RECOGNITION, VIDEOS
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10162957
Downloads since deposit
Loading...
168Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item