Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

Bartolo, Max; Thrush, Tristan; Jia, Robin; Riedel, Sebastian; Stenetorp, Pontus; Kiela, Douwe; (2021) Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. (pp. pp. 8830-8848). Association for Computational Linguistics Green open access

Preview

Text
2021.emnlp-main.696.pdf - Published Version
Download (742kB) | Preview

Abstract

Despite recent progress, state-of-the-art question answering models remain vulnerable to a variety of adversarial attacks. While dynamic adversarial data collection, in which a human annotator tries to write examples that fool a model-in-the-loop, can improve model robustness, this process is expensive which limits the scale of the collected data. In this work, we are the first to use synthetic adversarial data generation to make question answering models more robust to human adversaries. We develop a data generation pipeline that selects source passages, identifies candidate answers, generates questions, then finally filters or re-labels them to improve quality. Using this approach, we amplify a smaller human-written adversarial dataset to a much larger set of synthetic question-answer pairs. By incorporating our synthetic data, we improve the state-of-the-art on the AdversarialQA dataset by 3.7F1 and improve model generalisation on nine of the twelve MRQA datasets. We further conduct a novel human-in-the-loop evaluation to show that our models are considerably more robust to new human-written adversarial examples: crowdworkers can fool our model only 8.8% of the time on average, compared to 17.6% for a model trained without synthetic data.

Type:	Proceedings paper
Title:	Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation
Event:	2021 Conference on Empirical Methods in Natural Language Processing
Open access status:	An open access version is available from UCL Discovery
DOI:	10.18653/v1/2021.emnlp-main.696
Publisher version:	http://dx.doi.org/10.18653/v1/2021.emnlp-main.696
Language:	English
Additional information:	ACL materials are Copyright © 1963–2022 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
UCL classification:	UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science UCL > Provost and Vice Provost Offices > UCL BEAMS UCL
URI:	https://discovery.ucl.ac.uk/id/eprint/10145286

Downloads since deposit

42Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item