Turton, Jacob;
(2024)
Deriving Semantic Features From Distributional Embeddings with an Application in Conviction Narrative Theory.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Turton__thesis_.pdf - Other Download (23MB) | Preview |
Abstract
Recent advances in Natural Language Processing (NLP) have contributed models that perform at the level of human baselines across multiple benchmarks. However, improved performance has often come at the cost of model interpretability. This is noticeable in word embeddings, where individual dimensions often have no obvious meaning and embeddings tend to be interpreted in relation to each other rather than as individual objects. On the other hand, work in the domain of cognitive linguistics produces highly interpretable representations of words produced directly by human participants, meant to mirror how humans represent and understand words in the mind. However, the vocabulary coverage of these datasets tends to be limited as producing them is expensive and time consuming. The Binder dataset is one such resource, consisting of words rated across 65 semantic features all demonstrated (or strongly believed) to have direct neural correlates within the brain, but only having 535 words. Its small size severely limits potential uses of the Binder feature space, with expansion through human labelling of new words likely to be expensive and time consuming. This work instead demonstrates that it is possible to predict Binder features from distributional word embeddings, in line with previous findings [1], and that Binder features can be derived for a large new vocabulary of words. It then proceeds to explore deriving Binder features for words in context using contextualised embeddings, which is important as polysemy is common in the English language. Finally, as a real world demonstration of the utility of using derived Binder features, the feature space is used to differentiate a number of wordlists used to measure sentiment in financial texts on the level of underlying semantic meaning. It is shown that derived Binder features can successfully identify how the meaning of words chosen for lists differ and by further splitting lists along these lines of differentiation, improvements can be made in economic forecasting by using them. Overall it is hoped that this work helps encourage greater collaboration and integration between machine learning methods and insights from humancentered research.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Deriving Semantic Features From Distributional Embeddings with an Application in Conviction Narrative Theory |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2024. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10193328 |
Archive Staff Only
View Item |