Principled and Efficient Bilevel Optimization for Machine Learning

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Bookmark & Share

Principled and Efficient Bilevel Optimization for Machine Learning

Grazzi, Riccardo; (2023) Principled and Efficient Bilevel Optimization for Machine Learning. Doctoral thesis (Ph.D), UCL (University College London). Green open access

Preview

Text
PhD_Thesis_RG.pdf - Other
Download (9MB) | Preview

Abstract

Automatic differentiation (AD) is a core element of most modern machine learning libraries that allows to efficiently compute derivatives of a function from the corresponding program. Thanks to AD, machine learning practitioners have tackled increasingly complex learning models, such as deep neural networks with up to hundreds of billions of parameters, which are learned using the derivative (or gradient) of a loss function with respect to those parameters. While in most cases gradients can be computed exactly and relatively cheaply, in others the exact computation is either impossible or too expensive and AD must be used in combination with approximation methods. Some of these challenging scenarios arising for example in meta-learning or hyperparameter optimization, can be framed as bilevel optimization problems, where the goal is to minimize an objective function that is evaluated by first solving another optimization problem, the lower-level problem. In this work, we study efficient gradient-based bilevel optimization algorithms for machine learning problems. In particular, we establish convergence rates for some simple approaches to approximate the gradient of the bilevel objective, namely the hypergradient, when the objective is smooth and the lower-level problem consists in finding the fixed point of a contraction map. Leveraging such results, we also prove that the projected inexact hypergradient method achieves a (near) optimal rate of convergence. We establish these results for both the deterministic and stochastic settings. Additionally, we provide an efficient implementation of the methods studied and perform several numerical experiments on hyperparameter optimization, meta-learning, datapoisoning and equilibrium models, which show that our theoretical results are good indicators of the performance in practice.

Type:	Thesis (Doctoral)
Qualification:	Ph.D
Title:	Principled and Efficient Bilevel Optimization for Machine Learning
Open access status:	An open access version is available from UCL Discovery
Language:	English
Additional information:	Copyright © The Author 2023. Original content in this thesis is licensed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10178090

Downloads since deposit

125Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item