UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

NViST: In the Wild New View Synthesis from a Single Image with Transformers

Jang, Wonbong; Agapito, Lourdes; (2024) NViST: In the Wild New View Synthesis from a Single Image with Transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024. (pp. pp. 10181-10193). Institute of Electrical and Electronics Engineers (IEEE) Green open access

[thumbnail of De Agapito Vicente_In the Wild New View Synthesis from a Single Image with Transformers_AAM.pdf]
Preview
Text
De Agapito Vicente_In the Wild New View Synthesis from a Single Image with Transformers_AAM.pdf

Download (7MB) | Preview

Abstract

We propose NViST, a transformer-based model for efficient and generalizable novel-view synthesis from a single image for real-world scenes. In contrast to many methods that are trained on synthetic data, object-centred scenarios, or in a category-specific manner, NViST is trained on MVImgNet, a large-scale dataset of casually-captured real-world videos of hundreds of object categories with diverse backgrounds. NViST transforms image inputs directly into a radiance field, conditioned on camera parameters via adaptive layer normalisation. In practice, NViST exploits fine-tuned masked autoencoder (MAE) features and translates them to 3D output tokens via cross-attention, while addressing occlusions with self-attention. To move away from object-centred datasets and enable full scene synthesis, NViST adopts a 6-DOF camera pose model and only requires relative pose, dropping the need for canonicalization of the training data, which removes a substantial barrier to it being used on casually captured datasets. We show results on unseen objects and categories from MVImgNet and even generalization to casual phone captures. We conduct qualitative and quantitative evaluations on MVImgNet and ShapeNet to show that our model represents a step forward towards enabling true in-the-wild generalizable novel-view synthesis from a single image. Project webpage: https://wbjang.github.io/nvist_webpage.

Type: Proceedings paper
Title: NViST: In the Wild New View Synthesis from a Single Image with Transformers
Event: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Location: Seattle, WA, USA
Dates: 16th-22nd June 2024
ISBN-13: 979-8-3503-5300-6
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/CVPR52733.2024.00970
Publisher version: https://doi.org/10.1109/cvpr52733.2024.00970
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10203953
Downloads since deposit
Loading...
29Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item