UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Joshi, A; Gupta, N; Shah, J; Bhattarai, B; Modi, A; Stoyanov, D; (2022) Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments. In: ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction. (pp. pp. 83-93). ACM: New York, NY, United States. Green open access

[thumbnail of 2211.03587.pdf]
Preview
PDF
2211.03587.pdf - Accepted Version

Download (6MB) | Preview

Abstract

A real-world application or setting involves interaction between different modalities (e.g., video, speech, text). In order to process the multimodal information automatically and use it for an end application, Multimodal Representation Learning (MRL) has emerged as an active area of research in recent times. MRL involves learning reliable and robust representations of information from heterogeneous sources and fusing them. However, in practice, the data acquired from different sources are typically noisy. In some extreme cases, a noise of large magnitude can completely alter the semantics of the data leading to inconsistencies in the parallel multimodal data. In this paper, we propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique. In the proposed method, we train a separate network for each modality to assess the credibility of information coming from that modality, and subsequently, the contribution from each modality is dynamically varied while estimating the joint distribution. We evaluate our method on two challenging benchmarks from two diverse domains: multimodal 3D hand-pose estimation and multimodal surgical video segmentation. We attain state-of-the-art performance on both benchmarks. Our extensive quantitative and qualitative evaluations show the advantages of our method compared to previous approaches.

Type: Proceedings paper
Title: Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments
Event: ICMI '22: 2022 International Conference on Multimodal Interaction
ISBN-13: 9781450393904
Open access status: An open access version is available from UCL Discovery
DOI: 10.1145/3536221.3556596
Publisher version: https://doi.org/10.1145/3536221.3556596
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Multimodal Representations; Multimodal Fusion; Cross-modal Processing; Deep Learning Architectures; Machine Learning
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10162124
Downloads since deposit
Loading...
44Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item