Botev, Aleksandar;
(2020)
The Gauss-Newton matrix for Deep Learning models and its applications.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Aleksandar_Botev_thesis_one_sided.pdf - Accepted Version Download (6MB) | Preview |
Abstract
Deep Learning learning has recently become one of the most predominantly used techniques in the field of Machine Learning. Optimising these models, however, is very difficult and in order to scale the training to large datasets and model sizes practitioners use first-order optimisation methods. One of the main challenges of using the more sophisticated second-order optimisation methods is that the curvature matrices of the loss surfaces of neural networks are usually intractable, which is an open avenue for research. In this work, we investigate the Gauss-Newton matrix for neural networks and its application in different areas of Machine Learning. Firstly, we analyse the structure of the Hessian and Gauss-Newton matrices for Feed Forward Neural Networks. Several insightful results are presented, and the relationship of these two matrices to each other and to the Fisher matrix is discussed. Based on this analysis, we develop a block-diagonal Kronecker Factored approximation to the Gauss-Newton matrix. The method is experimentally validated in the context of second-order optimisation, where it achieves competitive performance to other approaches on three datasets. In the last part of this work, we investigate the application of the proposed method for constructing an approximation to the posterior distribution of the parameters of a neural network. The approximation is constructed by adapting the well known Laplace approximation using the Kronecker factored Gauss-Newton matrix approximation. The method is compared against Dropout, a commonly used technique for uncertainty estimation, and achieves better uncertainty estimates on out of distribution data and is less susceptible to adversarial attacks. By combining the Laplace approximation with the Bayesian framework for online learning, we develop a scalable method for overcoming catastrophic forgetting. It achieves significantly better results than other approaches in the literature on several sequential learning tasks. The final chapter discusses potential future research directions that could be of interest to the curious reader.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | The Gauss-Newton matrix for Deep Learning models and its applications |
Event: | UCL (University College London) |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2020. Original content in this thesis is licensed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
Keywords: | Deep Learning, Neural Networks, Gauss-Newton, Hessian, Curvature, Laplace, Optimization |
UCL classification: | UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10116494 |
Archive Staff Only
View Item |