eprintid: 10203622
rev_number: 14
eprint_status: archive
userid: 699
dir: disk0/10/20/36/22
datestamp: 2025-02-28 10:26:37
lastmod: 2025-02-28 10:26:37
status_changed: 2025-02-28 10:26:37
type: thesis
metadata_visibility: show
sword_depositor: 699
creators_name: Lyu, Zhaoyan
title: On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures
ispublished: unpub
divisions: UCL
divisions: B04
divisions: F46
note: Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
abstract: Despite the fact that there has been significant progress in machine learning, the optimization and generalization behaviour of neural networks is still poorly understood. In particular, it is largely unknown how deep learning-like models are able to generalize under a given training protocol for a particular dataset. 

The thesis addresses this challenge by exploring the dynamics of neural network optimization through information bottleneck-like methodologies. Concretely, by adopting a minimal mean-squared error (MMSE) based measure and a conditional entropy based measure, the thesis proposes a novel approach to quantifying the relationship between neural representations, input data, and ground-truth labels. This approach has the advantage that these proposed measures are easier to estimate than mutual information, thereby paving the way to shed light into a neural network optimization and generalization process more reliably. 

The proposed approach reveals that feed-forward neural networks typically exhibit a two-phase generalization pathway, which include an initial fitting phase followed by a subsequent compression phase, that seem to be critical to bolster model generalization. However, the proposed approach also reveals there are some models that do not exhibit both these phases, notably models featuring identity shortcuts such as Transformers. 

This thesis further explores how such models achieve information compression and discusses the need for fitting and compression in various scenarios. Additionally, the study provides an analytical perspective on atypical training behaviours, including ``grokking'', highlighting the intricate dynamics at play in neural network training. 

Overall, the thesis contributes to the body of work attempting to shed light on the behaviour of state-of-the-art machine learning models.
date: 2025-01-28
date_type: published
oa_status: green
full_text_type: other
thesis_class: doctoral_open
thesis_award: Ph.D
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 2353791
lyricists_name: Lyu, Zhaoyan
lyricists_id: ZLYUX31
actors_name: Lyu, Zhaoyan
actors_id: ZLYUX31
actors_role: owner
full_text_status: public
pagerange: 1-249
pages: 249
institution: UCL (University College London)
department: Electronic & Electrical Engineering
thesis_type: Doctoral
citation:        Lyu, Zhaoyan;      (2025)    On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures.                   Doctoral thesis  (Ph.D), UCL (University College London).     Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10203622/1/Lyu_10203622_Thesis.pdf