Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery

Zeng, Zichao; Boehm, Jan; (2024) Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery. ISPRS: International Journal of Geo-Information , 13 (5) , Article 153. 10.3390/ijgi13050153. Green open access

Preview

Text
ijgi-13-00153.pdf - Published Version
Download (18MB) | Preview

Abstract

This study investigates the efficacy of an open vocabulary, multi-modal, foundation model for the semantic segmentation of images from complex urban street scenes. Unlike traditional models reliant on predefined category sets, Grounded SAM uses arbitrary textual inputs for category definition, offering enhanced flexibility and adaptability. The model’s performance was evaluated across single and multiple category tasks using the benchmark datasets Cityscapes, BDD100K, GTA5, and KITTI. The study focused on the impact of textual input refinement and the challenges of classifying visually similar categories. Results indicate strong performance in single-category segmentation but highlighted difficulties in multi-category scenarios, particularly with categories bearing close textual or visual resemblances. Adjustments in textual prompts significantly improved detection accuracy, though challenges persisted in distinguishing between visually similar objects such as buses and trains. Comparative analysis with state-of-the-art models revealed Grounded SAM’s competitive performance, particularly notable given its direct inference capability without extensive dataset-specific training. This feature is advantageous for resource-limited applications. The study concludes that while open vocabulary models such as Grounded SAM mark a significant advancement in semantic segmentation, further improvements in integrating image and text processing are essential for better performance in complex scenarios.

Type:	Article
Title:	Exploration of an Open Vocabulary Model on Semantic Segmentation for Street Scene Imagery
Open access status:	An open access version is available from UCL Discovery
DOI:	10.3390/ijgi13050153
Publisher version:	http://dx.doi.org/10.3390/ijgi13050153
Language:	English
Additional information:	Copyright © 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/4.0/).
Keywords:	Street view; semantic segmentation; foundation models; open vocabulary; multi-modal AI; GeoAI
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of the Built Environment UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Civil, Environ and Geomatic Eng UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of the Built Environment > Bartlett School Env, Energy and Resources
URI:	https://discovery.ucl.ac.uk/id/eprint/10191931

Downloads since deposit

36Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item