Grazzi, Riccardo;
(2023)
Principled and Efficient Bilevel Optimization for Machine Learning.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
PhD_Thesis_RG.pdf - Other Download (9MB) | Preview |
Abstract
Automatic differentiation (AD) is a core element of most modern machine learning libraries that allows to efficiently compute derivatives of a function from the corresponding program. Thanks to AD, machine learning practitioners have tackled increasingly complex learning models, such as deep neural networks with up to hundreds of billions of parameters, which are learned using the derivative (or gradient) of a loss function with respect to those parameters. While in most cases gradients can be computed exactly and relatively cheaply, in others the exact computation is either impossible or too expensive and AD must be used in combination with approximation methods. Some of these challenging scenarios arising for example in meta-learning or hyperparameter optimization, can be framed as bilevel optimization problems, where the goal is to minimize an objective function that is evaluated by first solving another optimization problem, the lower-level problem. In this work, we study efficient gradient-based bilevel optimization algorithms for machine learning problems. In particular, we establish convergence rates for some simple approaches to approximate the gradient of the bilevel objective, namely the hypergradient, when the objective is smooth and the lower-level problem consists in finding the fixed point of a contraction map. Leveraging such results, we also prove that the projected inexact hypergradient method achieves a (near) optimal rate of convergence. We establish these results for both the deterministic and stochastic settings. Additionally, we provide an efficient implementation of the methods studied and perform several numerical experiments on hyperparameter optimization, meta-learning, datapoisoning and equilibrium models, which show that our theoretical results are good indicators of the performance in practice.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Principled and Efficient Bilevel Optimization for Machine Learning |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2023. Original content in this thesis is licensed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10178090 |
Archive Staff Only
![]() |
View Item |