UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Understanding the extreme vulnerability of image classifiers to adversarial examples

Tanay, Thomas; (2020) Understanding the extreme vulnerability of image classifiers to adversarial examples. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of thesis_Thomas_Tanay_corrected.pdf]
Preview
Text
thesis_Thomas_Tanay_corrected.pdf

Download (73MB) | Preview

Abstract

State-of-the-art deep networks for image classification are vulnerable to adversarial examples—misclassified images which are obtained by applying imperceptible non-random perturbations to correctly classified test images. This vulnerability is somewhat paradoxical: how can these models perform so well, if they are so sensitive to small perturbations of their inputs? Two early but influential explanations focused on the high non-linearity of deep networks, and on the high-dimensionality of image space. We review these explanations and highlight their limitations, before introducing a new perspective according to which adversarial examples exist when the classification boundary lies close to the manifold of normal data. We present a detailed mathematical analysis of the new perspective in binary linear classification, where the adversarial vulnerability of a classifier can be reduced to the deviation angle between its weight vector and the weight vector of the nearest centroid classifier. This analysis leads us to identify two types of adversarial examples: those affecting optimal classifiers, which are limited by a fundamental robustness/accuracy trade-off, and those affecting sub-optimal classifiers, resulting from imperfect training procedures or overfitting. We then show that L2 regularization plays an important role in practice, by acting as a balancing mechanism between two objectives: the minimization of the error and the maximization of the adversarial distance over the training set. We finally generalize our considerations to deep neural networks, reinterpreting in particular weight decay and adversarial training as belonging to a same family of output regularizers. If designing models that are robust to small image perturbations remains challenging, we show in the last Chapter of this thesis that state-of-the-art networks can easily be made more vulnerable. Reversing the problem in this way exposes new attack scenarios and, crucially, helps improve our understanding of the adversarial example phenomenon by emphasizing the role played by low variance directions.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Understanding the extreme vulnerability of image classifiers to adversarial examples
Event: UCL (University College London)
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2020. Original content in this thesis is licensed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10112218
Downloads since deposit
95Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item