On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures

Lyu, Zhaoyan; (2025) On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures. Doctoral thesis (Ph.D), UCL (University College London). Green open access

Preview

Text
Lyu_10203622_Thesis.pdf
Download (53MB) | Preview

Abstract

Despite the fact that there has been significant progress in machine learning, the optimization and generalization behaviour of neural networks is still poorly understood. In particular, it is largely unknown how deep learning-like models are able to generalize under a given training protocol for a particular dataset. The thesis addresses this challenge by exploring the dynamics of neural network optimization through information bottleneck-like methodologies. Concretely, by adopting a minimal mean-squared error (MMSE) based measure and a conditional entropy based measure, the thesis proposes a novel approach to quantifying the relationship between neural representations, input data, and ground-truth labels. This approach has the advantage that these proposed measures are easier to estimate than mutual information, thereby paving the way to shed light into a neural network optimization and generalization process more reliably. The proposed approach reveals that feed-forward neural networks typically exhibit a two-phase generalization pathway, which include an initial fitting phase followed by a subsequent compression phase, that seem to be critical to bolster model generalization. However, the proposed approach also reveals there are some models that do not exhibit both these phases, notably models featuring identity shortcuts such as Transformers. This thesis further explores how such models achieve information compression and discusses the need for fitting and compression in various scenarios. Additionally, the study provides an analytical perspective on atypical training behaviours, including ``grokking'', highlighting the intricate dynamics at play in neural network training. Overall, the thesis contributes to the body of work attempting to shed light on the behaviour of state-of-the-art machine learning models.

Type:	Thesis (Doctoral)
Qualification:	Ph.D
Title:	On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures
Open access status:	An open access version is available from UCL Discovery
Language:	English
Additional information:	Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI:	https://discovery.ucl.ac.uk/id/eprint/10203622

Downloads since deposit

10Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item