Exploiting Geometrically Meaningful Intermediate Representations for Online 3D Reconstruction

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Exploiting Geometrically Meaningful Intermediate Representations for Online 3D Reconstruction

Watson, Jamie; (2025) Exploiting Geometrically Meaningful Intermediate Representations for Online 3D Reconstruction. Doctoral thesis (Ph.D), UCL (University College London). Green open access

Preview

Text
Watson_thesis.pdf - Accepted Version
Download (65MB) | Preview

Abstract

The ability of computers to understand the shape of the world is crucial in many fields, including robotics, augmented reality, and autonomous driving. Enabling this understanding when given only RGB images as input is highly appealing, as it allows for a broader range of use cases and accessibility. Additionally, a machine inferring 3D geometry interactively as it explores is essential for many applications, where capturing an environment in advance would be problematic due to cost or the presence of dynamic objects. As such, in this thesis we investigate methods to improve 3D geometry estimation in an online setting, taking as input easily available information such as images and camera poses. In general, online 3D geometry can be estimated as either (i) an instantaneous snapshot, such as in the form of a depth map, or (ii) as a persistent 3D model which is progressively refined as an environment is captured. We explore both modalities across four tasks. Crucially, in all cases, we find that geometrically meaningful intermediate representations either improve accuracy beyond the current state-of-the-art, or allow for a reduction in computational cost whilst maintaining high accuracy. First, we tackle the task of self-supervised monocular depth estimation when given only videos as training data. We make the observation that in many cases a system has access to a sequence of images rather than just a single image at inference time, and introduce a cost volume into the monocular depth framework. We show that the cost volume, coupled with several other key innovations, allows our model to obtain state-of-the-art results. Next, we turn to the task of indoor 3D reconstruction from posed images. Recent learning based methods operate in an end-to-end manner, utilising 3D convolutional networks to directly predict signed-distance values per voxel in 3D space. Instead, we relax the requirement for full 3D and target a lightweight 2½D representation, which greatly reduces the computational burden whilst retaining comparable fidelity. For our third task, we investigate the problem of occlusion estimation for augmented reality applications. Typically, occlusions are estimated using a predicted depth map, with the predicted depth compared to the virtual depth for each pixel. Instead, we show that we can directly predict binary occlusion masks using an RGB image and a virtual asset depth render as input, leading to improved accuracy vs prior depth based approaches. Finally, we tackle the task of online 3D planar reconstruction from posed images. Our core innovation is to train a per-scene MLP to output 3D consistent embedding vectors, which can be used within traditional clustering algorithms to obtain accurate planar instances. Our model achieves state-of-the-art results in terms of planar reconstruction accuracy, whilst also operating at interactive speeds. We validate the effectiveness of our geometrically meaningful representations through extensive experiments on internationally recognized datasets. We conduct evaluations on existing benchmarks whenever possible, and introduce new evaluation protocols when necessary to accurately assess the performance of our models.

Type:	Thesis (Doctoral)
Qualification:	Ph.D
Title:	Exploiting Geometrically Meaningful Intermediate Representations for Online 3D Reconstruction
Open access status:	An open access version is available from UCL Discovery
Language:	English
Additional information:	Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10215094

Downloads since deposit

18Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item