Nazir, Saba;
(2025)
Multimodal Compositional Distributional Semantics.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Saba_Nazir_Thesis.pdf - Submitted Version Download (5MB) | Preview |
Abstract
Representing meaning in language has long been a key challenge in natural language processing, with diverse approaches seeking to capture its complexity. Distributional semantics offers a methodology for training high quality statistical representations for words; compositional distributional semantics extends these to longer phrases and sentences by encoding the statistics of words with function types, such as adjectives and verbs. Multimodal distributional semantics combines linguistic statistics with visual and auditory perceptions to ground word representations. While successful in word-level tasks, particularly in visual contexts, its application to compositional semantics with auditory grounding remains largely unexplored. This thesis addresses this limitation by introducing a multimodal compositional distributional semantics framework that builds upon tensor-based compositional models and grounds them auditorily. To the best of our knowledge, this is the first work of its kind. The framework is evaluated using a newly developed sound-relevant adjective-noun phrase similarity benchmark, measuring semantic and audio similarity. Results show that (1) compositional models outperform non-compositional baselines, (2) matrix- based compositions surpass vector addition and multiplication, and (3) multimodal models enhance performance over unimodal ones. Further evaluations on a multi-label sentiment classification task demonstrates improved accuracy over text-only models. Additionally, this thesis provides a general baseline for the application of multimodal distributional semantics in recommendation systems, while opening new avenues for future research.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Multimodal Compositional Distributional Semantics |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10212596 |
Archive Staff Only
![]() |
View Item |