UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Learning Transferable Representations from Multimodal Data for Multisensory Perception

Xia, Weihao; (2025) Learning Transferable Representations from Multimodal Data for Multisensory Perception. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of UCL_PhD_Thesis.pdf]
Preview
Text
UCL_PhD_Thesis.pdf - Submitted Version

Download (78MB) | Preview

Abstract

While generative models and multimodal large language models have revolutionized domains such as scientific discovery, media creation, and creative arts by leveraging human capabilities in writing, hearing, and vision, the potential of other sensory modalities—such as touch, taste, smell, and brain signals—remains largely untapped as interfaces for interaction. This thesis addresses this gap by exploring transferable representations from underutilized modalities, including touch and brain signals, beyond the commonly studied vision-audio-text triad. Specifically, it investigates the unique sensory characteristics of different modalities and develops multimodal integration strategies, focusing on creating transferable representations to enhance multisensory processing, perception, and understanding. First, we demonstrate how pretrained image models can be adapted for video editing by modeling dynamics within the latent space. The proposed approach enables computationally efficient applications by learning continuous trajectories, allowing desired attributes of an entire video to be modified by editing only the initial frame and propagating the changes across the sequence. This preserves temporal coherence and eliminates the need for redundant per-frame editing. Second, we explore the brain modalities using DREAM, an brain-toimage reconstruction method designed to translate recorded brain activities into corresponding visual imagery. Grounded in foundational knowledge of the human visual system, this method deciphers semantics, color, and depth cues from brain data, mirroring the forward pathways from visual stimuli to brain responses. Third, brain responses differ from text, images, or audio, which are intuitively aligned with human perception and judgment, as they are not directly interpretable or interoperable by humans. To bridge this gap, we propose UMRBAE, a unified multimodal brain decoding method that translates brain responses into comprehensible modalities. This provides an intuitive framework for evaluating the brain’s ability to describe, recognize, localize instances, and discern spatial relationships among multiple exemplars. Finally, we explore tactile signals, emphasizing the critical yet underexplored role of surface properties in shaping touch experiences. A novel framework RETRO is introduced that integrates material-aware priors — prelearned characteristics of various materials — into tactile representation learning. This enhances the model’s capacity to effectively capture and generalize the complexities of surface textures.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Learning Transferable Representations from Multimodal Data for Multisensory Perception
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/10209250
Downloads since deposit
24Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item