Williams, Jennie;
(2025)
From Knowledge to Innovation: Using NLP to Identify Innovative Activity.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Williams_10207135_Thesis.pdf Download (172MB) | Preview |
Abstract
In response to the challenges posed by Brexit and the Covid-19 pandemic, the UK updated its industrial strategy to position itself as ‘a global hub for innovation’ by 2035. However, defining and measuring innovation remains complex, with standard metrics like R&D spending, patent activity, and researcher counts failing to capture the full spectrum of innovative activity, especially outside Science, Technology, Engineering, and Mathematics (STEM) fields. This highlights the need for alternative metrics to better understand and track innovation across a wider range of disciplines, including the arts, humanities, and social sciences. This thesis addresses this gap by using doctoral thesis content as a proxy for innovative activity. PhD theses, representing novel and non-trivial research, capture groundbreaking work across diverse fields. Using data from the British Library’s E- Thesis Online Service (EThOS), this research employs advanced Natural Language Processing (NLP) techniques, including word-to-document embeddings and machine learning algorithms, to process and analyse the unstructured metadata of PhD theses. The goal is to identify clusters of innovation by constructing a semantic space where research outputs are related based on their textual content. The analysis reveals innovative clusters across both STEM and non-STEM disciplines. Clusters in fields such as particle physics and photovoltaic materials highlight innovation in scientific areas, while clusters in archaeology, musicology, and urban planning demonstrate innovation outside STEM. The research also shows geographic spread and thematic cohesion within these clusters, highlighting the interdisciplinary nature of academic innovation. This study confirms that text-based analysis of doctoral research can effectively detect and classify innovative activity, offering a scalable methodology that captures the spectrum of academic innovation. The findings emphasise the potential to uncover hidden patterns and provide a richer understanding of the innovation landscape across all fields.
| Type: | Thesis (Doctoral) |
|---|---|
| Qualification: | Ph.D |
| Title: | From Knowledge to Innovation: Using NLP to Identify Innovative Activity |
| Open access status: | An open access version is available from UCL Discovery |
| Language: | English |
| Additional information: | Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
| Keywords: | Knowledge mapping, Innovation, word embedding, Hierarchical Cluster Analysis, Metadata |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of the Built Environment UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of the Built Environment > Centre for Advanced Spatial Analysis |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10207135 |
Archive Staff Only
![]() |
View Item |

