Zhi, Zhuo;
liu, ziquan;
wu, qiangqiang;
Rodrigues, Miguel;
(2024)
Wasserstein Modality Alignment Makes Your Multimodal Transformer More Robust.
In:
Proceedings of ICML 2024.
(pp. pp. 1-11).
Proceedings of Machine Learning Research (PMLR ): Vienna, Austria.
Preview |
Text
22_wasserstein_modality_alignment.pdf - Published Version Download (370kB) | Preview |
Abstract
Early fusion at a one-tower model such as a multimodal transformer is an effective multimodal learning paradigm. However, in a multimodal transformer, the modality fusion is performed solely through the self-attention function, which is originally designed for unimodal token sequences. To improve the self-attention mechanism for handling multimodal input, a parametric adapter model, like the Q-former in BLIP-2, is often used to align tokens from different modalities. Unlike existing methods that use an adapter model for modality alignment, our paper proposes an implicit approach based on Wasserstein distance that aligns tokens from different modalities in a multimodal transformer without using any additional parameters. Our empirical study shows that the implicit modality alignment improves the effectiveness of the multimodal Transformer in discriminative tasks, as well as its robustness to input noise and missing modalities. We conduct experiments on four different types of downstream task datasets, including both 2-modalities and 3- modalities tasks. In standard testing, testing with modality noise, and testing with missing modalities, the averaged improvement of our method compared with the baseline over all datasets are 0.9%, 2.5%, and 2.1% respectively.
Type: | Proceedings paper |
---|---|
Title: | Wasserstein Modality Alignment Makes Your Multimodal Transformer More Robust |
Event: | ICML 2024 TiFA Workshop |
Open access status: | An open access version is available from UCL Discovery |
Publisher version: | https://proceedings.mlr.press/ |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng |
URI: | https://discovery.ucl.ac.uk/id/eprint/10194484 |




Archive Staff Only
![]() |
View Item |