UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures

Lyu, Zhaoyan; (2025) On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Lyu_10203622_Thesis.pdf]
Preview
Text
Lyu_10203622_Thesis.pdf

Download (53MB) | Preview

Abstract

Despite the fact that there has been significant progress in machine learning, the optimization and generalization behaviour of neural networks is still poorly understood. In particular, it is largely unknown how deep learning-like models are able to generalize under a given training protocol for a particular dataset. The thesis addresses this challenge by exploring the dynamics of neural network optimization through information bottleneck-like methodologies. Concretely, by adopting a minimal mean-squared error (MMSE) based measure and a conditional entropy based measure, the thesis proposes a novel approach to quantifying the relationship between neural representations, input data, and ground-truth labels. This approach has the advantage that these proposed measures are easier to estimate than mutual information, thereby paving the way to shed light into a neural network optimization and generalization process more reliably. The proposed approach reveals that feed-forward neural networks typically exhibit a two-phase generalization pathway, which include an initial fitting phase followed by a subsequent compression phase, that seem to be critical to bolster model generalization. However, the proposed approach also reveals there are some models that do not exhibit both these phases, notably models featuring identity shortcuts such as Transformers. This thesis further explores how such models achieve information compression and discusses the need for fitting and compression in various scenarios. Additionally, the study provides an analytical perspective on atypical training behaviours, including ``grokking'', highlighting the intricate dynamics at play in neural network training. Overall, the thesis contributes to the body of work attempting to shed light on the behaviour of state-of-the-art machine learning models.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10203622
Downloads since deposit
Loading...
10Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item