Frequency-guided word substitutions for detecting textual adversarial examples

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Frequency-guided word substitutions for detecting textual adversarial examples

Mozes, M; Stenetorp, P; Kleinberg, B; Griffin, LD; (2021) Frequency-guided word substitutions for detecting textual adversarial examples. In: (pp. pp. 171-186). Green open access

Preview

Text
2021.eacl-main.13.pdf - Published Version
Download (618kB) | Preview

Abstract

Recent efforts have shown that neural text processing models are vulnerable to adversarial examples, but the nature of these examples is poorly understood. In this work, we show that adversarial attacks against CNN, LSTM and Transformer-based classification models perform word substitutions that are identifiable through frequency differences between replaced words and their corresponding substitutions. Based on these findings, we propose frequency-guided word substitutions (FGWS), a simple algorithm exploiting the frequency properties of adversarial word substitutions for the detection of adversarial examples. FGWS achieves strong performance by accurately detecting adversarial examples on the SST-2 and IMDb sentiment datasets, with F1 detection scores of up to 91.4% against RoBERTa-based classification models. We compare our approach against a recently proposed perturbation discrimination framework and show that we outperform it by up to 13.0% F1.

Type:	Proceedings paper
Title:	Frequency-guided word substitutions for detecting textual adversarial examples
ISBN-13:	9781954085022
Open access status:	An open access version is available from UCL Discovery
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Security and Crime Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10130496

Downloads since deposit

99Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item