Jin, Yueming;
Long, Yonghao;
Gao, Xiaojie;
Stoyanov, Danail;
Dou, Qi;
Pheng-Ann, Heng;
(2022)
Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis.
International Journal of Computer Assisted Radiology and Surgery
, 17
(12)
pp. 2193-2202.
10.1007/s11548-022-02743-8.
Preview |
Text
IJCARS22 - Yueming Jin.pdf - Accepted Version Download (2MB) | Preview |
Abstract
Purpose: Real-time surgical workflow analysis has been a key component for computer-assisted intervention system to improve cognitive assistance. Most existing methods solely rely on conventional temporal models and encode features with a successive spatial–temporal arrangement. Supportive benefits of intermediate features are partially lost from both visual and temporal aspects. In this paper, we rethink feature encoding to attend and preserve the critical information for accurate workflow recognition and anticipation. Methods: We introduce Transformer in surgical workflow analysis, to reconsider complementary effects of spatial and temporal representations. We propose a hybrid embedding aggregation Transformer, named Trans-SVNet, to effectively interact with the designed spatial and temporal embeddings, by employing spatial embedding to query temporal embedding sequence. We jointly optimized by loss objectives from both analysis tasks to leverage their high correlation. Results: We extensively evaluate our method on three large surgical video datasets. Our method consistently outperforms the state-of-the-arts across three datasets on workflow recognition task. Jointly learning with anticipation, recognition results can gain a large improvement. Our approach also shows its effectiveness on anticipation with promising performance achieved. Our model achieves a real-time inference speed of 0.0134 second per frame. Conclusion: Experimental results demonstrate the efficacy of our hybrid embeddings integration by rediscovering the crucial cues from complementary spatial–temporal embeddings. The better performance by multi-task learning indicates that anticipation task brings the additional knowledge to recognition task. Promising effectiveness and efficiency of our method also show its promising potential to be used in operating room.
Type: | Article |
---|---|
Title: | Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis |
Location: | Germany |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1007/s11548-022-02743-8 |
Publisher version: | https://doi.org/10.1007/s11548-022-02743-8 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | Science & Technology, Technology, Life Sciences & Biomedicine, Engineering, Biomedical, Radiology, Nuclear Medicine & Medical Imaging, Surgery, Engineering, Surgical vision, Workflow recognition, Workflow anticipation, Transformer, Spatial-temporal feature modeling, RECOGNITION, VIDEOS |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10162957 |




Archive Staff Only
![]() |
View Item |