Penn, Alan;
Lin, Yubo;
Chen, Ke;
(2025)
Image Completion Network Considering Global and
Local Information.
Buildings
, 15
(20)
, Article 3746. 10.3390/buildings15203746.
Preview |
Text
buildings-15-03746-v2.pdf - Published Version Download (1MB) | Preview |
Abstract
Accurate depth image inpainting in complex urban environments remains a critical challenge due to occlusions, reflections, and sensor limitations, which often result in significant data loss. We propose a hybrid deep learning framework that explicitly combines local and global modelling through Convolutional Neural Networks (CNNs) and Transformer modules. The model employs a multi-branch parallel architecture, where the CNN branch captures fine-grained local textures and edges, while the Transformer branch models global semantic structures and long-range dependencies. We introduce an optimized attention mechanism, Agent Attention, which differs from existing efficient/linear attention methods by using learnable proxy tokens tailored for urban scene categories (e.g., façades, sky, ground). A content-guided dynamic fusion module adaptively combines multi-scale features to enhance structural alignment and texture recovery. The frame-work is trained with a composite loss function incorporating pixel accuracy, perceptual similarity, adversarial realism, and structural consistency. Extensive experiments on the Paris StreetView dataset demonstrate that the proposed method achieves state-of-the-art performance, outperforming existing approaches in PSNR, SSIM, and LPIPS metrics. The study highlights the potential of multi-scale modeling for urban depth inpainting and discusses challenges in real-world deployment, ethical considerations, and future directions for multimodal integration.
| Type: | Article |
|---|---|
| Title: | Image Completion Network Considering Global and Local Information |
| Open access status: | An open access version is available from UCL Discovery |
| DOI: | 10.3390/buildings15203746 |
| Publisher version: | https://doi.org/10.3390/buildings15203746 |
| Language: | English |
| Additional information: | © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
| Keywords: | Image inpainting; depth completion; multi-scale modeling; Transformer-CNN fusion; urban scene understanding |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of the Built Environment UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of the Built Environment > The Bartlett School of Architecture |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10215771 |
Archive Staff Only
![]() |
View Item |

