UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A spatio-temporal network for video semantic segmentation in surgical videos

Grammatikopoulou, Maria; Sanchez-Matilla, Ricardo; Bragman, Felix; Owen, David; Culshaw, Lucy; Kerr, Karen; Stoyanov, Danail; (2024) A spatio-temporal network for video semantic segmentation in surgical videos. International Journal of Computer Assisted Radiology and Surgery , 19 pp. 375-382. 10.1007/s11548-023-02971-6. Green open access

[thumbnail of 2306.11052.pdf]
Preview
Text
2306.11052.pdf - Accepted Version

Download (6MB) | Preview

Abstract

PURPOSE: Semantic segmentation in surgical videos has applications in intra-operative guidance, post-operative analytics and surgical education. Models need to provide accurate predictions since temporally inconsistent identification of anatomy can hinder patient safety. We propose a novel architecture for modelling temporal relationships in videos to address these issues. METHODS: We developed a temporal segmentation model that includes a static encoder and a spatio-temporal decoder. The encoder processes individual frames whilst the decoder learns spatio-temporal relationships from frame sequences. The decoder can be used with any suitable encoder to improve temporal consistency. RESULTS: Model performance was evaluated on the CholecSeg8k dataset and a private dataset of robotic Partial Nephrectomy procedures. Mean Intersection over Union improved by 1.30% and 4.27% respectively for each dataset when the temporal decoder was applied. Our model also displayed improvements in temporal consistency up to 7.23%. CONCLUSIONS: This work demonstrates an advance in video segmentation of surgical scenes with potential applications in surgery with a view to improve patient outcomes. The proposed decoder can extend state-of-the-art static models, and it is shown that it can improve per-frame segmentation output and video temporal consistency.

Type: Article
Title: A spatio-temporal network for video semantic segmentation in surgical videos
Location: Germany
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s11548-023-02971-6
Publisher version: https://doi.org/10.1007/s11548-023-02971-6
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions.
Keywords: Semantic segmentation, Video segmentation
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10175450
Downloads since deposit
3Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item