Mahfouz, Basil;
(2025)
Machine Learning Methods for Mapping the Policy Impact of Science.
Doctoral thesis (Ph.D), UCL (University College London).
|
Text
Mahfouz_10215690_Thesis_sigs_removed.pdf Access restricted to UCL open access staff until 28 November 2026. Download (24MB) |
Abstract
What determines which research informs policy, and which research does not? New databases comprising millions of policy documents and their scholarly citations have enabled researchers to study the science – policy interface at unprecedented scale. However, early quantitative studies examining science uptake in policy compared policy-cited papers against all uncited research. This approach fails to distinguish between papers that were not cited because they lack relevance, and relevant research that policymakers overlooked. To address this fundamental challenge, this thesis develops and tests four approaches to control for policy relevance. The first method assumes research mapped to United Nations Sustainable Development Goals as inherently policy-relevant domains, then compares bibliometric indicators within and between these areas. The second approach matches cited papers with uncited research within the same co-citation network that exceed a minimum threshold of textual similarity. The third method employs pretrained language models to identify highly similar research pairs where one paper received policy citations and one did not, creating quasi-experimental conditions for comparing different indicators. The final approach develops scalable machine learning pipelines that generate continuous policy relevance scores for individual papers by training classifiers on semantic patterns extracted from abstracts. Applying these methods to millions of research papers reveals that policymakers draw from remarkably narrow evidence sources despite abundant relevant research being available. Conventional indicators of research excellence show limited effects on policy uptake once content relevance is controlled. Academic citation counts, journal prestige, and author h-index demonstrate minimal influence on which research is cited in policy documents. Instead, media coverage emerges as a crucial amplifier of research visibility, whilst author networks prove the primary driver of policy citations. Government collaboration, surprisingly, provides minimal advantage for policy uptake. The results highlight the urgent need for reformed systems to improve evidence use in policymaking, proposing data-driven tools for bridging the science-policy gap.
| Type: | Thesis (Doctoral) |
|---|---|
| Qualification: | Ph.D |
| Title: | Machine Learning Methods for Mapping the Policy Impact of Science |
| Language: | English |
| Additional information: | Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > STEaPP |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10215690 |
Archive Staff Only
![]() |
View Item |

