UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Improving Neural Question Answering with Retrieval and Generation

Lewis, Patrick Simon Hampden; (2022) Improving Neural Question Answering with Retrieval and Generation. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Patrick_Lewis_Thesis__for_submission_to_UCL_.pdf]
Preview
Text
Patrick_Lewis_Thesis__for_submission_to_UCL_.pdf - Accepted Version

Download (3MB) | Preview

Abstract

Text-based Question Answering (QA) is a subject of interest both for its practical applications, and as a test-bed to measure the key Artificial Intelligence competencies of Natural Language Processing (NLP) and the representation and application of knowledge. QA has progressed a great deal in recent years by adopting neural networks, the construction of large training datasets, and unsupervised pretraining. Despite these successes, QA models require large amounts of hand-annotated data, struggle to apply supplied knowledge effectively, and can be computationally ex- pensive to operate. In this thesis, we employ natural language generation and information retrieval techniques in order to explore and address these three issues. We first approach the task of Reading Comprehension (RC), with the aim of lifting the requirement for in-domain hand-annotated training data. We describe a method for inducing RC capabilities without requiring hand-annotated RC instances, and demonstrate performance on par with early supervised approaches. We then explore multi-lingual RC, and develop a dataset to evaluate methods which enable training RC models in one language, and testing them in another. Second, we explore open-domain QA (ODQA), and consider how to build mod- els which best leverage the knowledge contained in a Wikipedia text corpus. We demonstrate that retrieval-augmentation greatly improves the factual predictions of large pretrained language models in unsupervised settings. We then introduce a class of retrieval-augmented generator model, and demonstrate its strength and flexibility across a range of knowledge-intensive NLP tasks, including ODQA. Lastly, we study the relationship between memorisation and generalisation in ODQA, developing a behavioural framework based on memorisation to contextualise the performance of ODQA models. Based on these insights, we introduce a class of ODQA model based on the concept of representing knowledge as question- answer pairs, and demonstrate how, by using question generation, such models can achieve high accuracy, fast inference, and well-calibrated predictions.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Improving Neural Question Answering with Retrieval and Generation
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2022. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10151750
Downloads since deposit
65Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item