UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Exploiting Neural Priors in Visual SLAM

Wang, Jingwen; (2024) Exploiting Neural Priors in Visual SLAM. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Wang_10193387_thesis.pdf]
Preview
Text
Wang_10193387_thesis.pdf

Download (80MB) | Preview

Abstract

Traditional simultaneous localisation and mapping (SLAM) has shown great performance in camera tracking and geometry reconstruction in various types of environments. However, to enable more advanced applications such as achieving semantic level understanding of the scene, hole-filling, and scene completion in unobserved regions, some (learnt or general) priors need to be applied. In this thesis we aim to explore the incorporation of specific types of prior information in the 3D reconstruction process, such as pre-learnt shape or semantic priors as well as analytical geometric priors. First, in GO-Surf we leverage neural implicit representations with general geometric priors for accurate and fast surface reconstruction from RGB-D sequences. We represent the scene as a multi-level feature grid plus two tiny MLPs decoding the feature into SDF and colour. The training is supervised with rendering and pseudo-SDF losses, plus Eikonal and SDF gradient regularization that encourages surface smoothness and hole-filling. GO-Surf can optimize sequences of 1-2K frames in 15-45 minutes, more than 60 times faster than previous MLP-based method, while maintaining on par performance on standard benchmarks. This work is further extended to a full real-time SLAM system named Co-SLAM. Then, in DSP-SLAM we apply pre-learnt shape priors for complete object shape reconstruction. DSP-SLAM builds a rich and accurate joint map of dense 3D objects and sparse landmark points as background. Objects are detected via instance segmentation, and their shape and pose are optimised using category-specific deep shape embeddings as priors, via a novel second order optimization. Our object-aware bundle adjustment builds a pose-graph to jointly optimize camera poses, object locations and feature points. DSP-SLAM can operate at 10 Hz on 3 different input modalities: monocular, stereo, or stereo+LiDAR. Finally, in SeMLaPS we leverage temporal consistency and geometric priors for real-time online semantic mapping. When segmenting a new RGB-D frame, latent feature maps are re-projected from previous frames, which greatly improves 2D segmentation accuracy and temporal consistency. Next, we propose a quasi-planar over-segmentation method that groups raw 3D map elements into segments based on surface normal. A novel 3D CNN then applies post-processing to the labelled mesh at segment level. SeMLaPS achieves state-of-the-art semantic mapping quality and shows better cross-sensor generalization abilities compared to 3D CNNs.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Exploiting Neural Priors in Visual SLAM
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2024. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10193387
Downloads since deposit
44Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item