UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

When Do Flat Minima Optimizers Work?

Kaddour, Jean; Linqing, Liu; Silva, Ricardo; Kusner, Matt; (2022) When Do Flat Minima Optimizers Work? In: NeurIPS Proceedings. NeurIPS: New Orleans, LA, USA. Green open access

[thumbnail of 2202.00661.pdf]
Preview
Text
2202.00661.pdf - Published Version

Download (3MB) | Preview

Abstract

Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been limited investigation into their properties and no systematic benchmarking of them across different domains. We fill this gap here by comparing the loss surfaces of the models trained with each method and through broad benchmarking across computer vision, natural language processing, and graph representation learning tasks. We discover several surprising findings from these results, which we hope will help researchers further improve deep learning optimizers, and practitioners identify the right optimizer for their problem.

Type: Proceedings paper
Title: When Do Flat Minima Optimizers Work?
Event: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)
Open access status: An open access version is available from UCL Discovery
Publisher version: https://proceedings.neurips.cc/paper_files/paper/2...
Language: English
Additional information: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/10166998
Downloads since deposit
31Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item