UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Towards Machine-Assisted Meta Studies of Astrophysical Data From the Scientific Literature

Crossland, Thomas David; (2023) Towards Machine-Assisted Meta Studies of Astrophysical Data From the Scientific Literature. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Machine_Assisted_Meta_Studies_Thesis_Final_Submission.pdf]
Preview
Text
Machine_Assisted_Meta_Studies_Thesis_Final_Submission.pdf - Accepted Version

Download (3MB) | Preview

Abstract

We develop a new model for automatic extraction of reported measurements from the astrophysical literature, utilising modern Natural Language Processing techniques. We begin with a rules-based model for keyword-search-based extraction, and then proceed to develop artificial neural network models for full entity and relation extraction from free text. This process also requires the creation of hand-annotated datasets selected from the available astrophysical literature for training and validation purposes. We use a set of cosmological parameters to examine the model's ability to identify information relating to a specific parameter and to illustrate its capabilities, using the Hubble constant as a primary case study due to the well-document history of that parameter. Our results correctly highlight the current tension present in measurements of the Hubble constant and recover the 3.5σ discrepancy – demonstrating that the models are useful for meta-studies of astrophysical measurements from a large number of publications. From the other cosmological parameter results we can clearly observe the historical trends in the reported values of these quantities over the past two decades, and see the impacts of landmark publications on our understanding of cosmology. The outputs of these models, when applied to the article abstracts present in the arXiv repository, constitute a database of over 231,000 astrophysical numerical measurements, relating to over 61,000 different symbolic parameter representations – here a measurement refers to the combination of a numerical value and an identifier (i.e. a name or symbol) to give it physical meaning. We present an online interface (Numerical Atlas) to allow users to query and explore this database, based on parameter names and symbolic representations, and download the resulting datasets for their own research uses.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Towards Machine-Assisted Meta Studies of Astrophysical Data From the Scientific Literature
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2022. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Space and Climate Physics
URI: https://discovery.ucl.ac.uk/id/eprint/10164519
Downloads since deposit
54Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item