eprintid: 10203622 rev_number: 14 eprint_status: archive userid: 699 dir: disk0/10/20/36/22 datestamp: 2025-02-28 10:26:37 lastmod: 2025-02-28 10:26:37 status_changed: 2025-02-28 10:26:37 type: thesis metadata_visibility: show sword_depositor: 699 creators_name: Lyu, Zhaoyan title: On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures ispublished: unpub divisions: UCL divisions: B04 divisions: F46 note: Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. abstract: Despite the fact that there has been significant progress in machine learning, the optimization and generalization behaviour of neural networks is still poorly understood. In particular, it is largely unknown how deep learning-like models are able to generalize under a given training protocol for a particular dataset. The thesis addresses this challenge by exploring the dynamics of neural network optimization through information bottleneck-like methodologies. Concretely, by adopting a minimal mean-squared error (MMSE) based measure and a conditional entropy based measure, the thesis proposes a novel approach to quantifying the relationship between neural representations, input data, and ground-truth labels. This approach has the advantage that these proposed measures are easier to estimate than mutual information, thereby paving the way to shed light into a neural network optimization and generalization process more reliably. The proposed approach reveals that feed-forward neural networks typically exhibit a two-phase generalization pathway, which include an initial fitting phase followed by a subsequent compression phase, that seem to be critical to bolster model generalization. However, the proposed approach also reveals there are some models that do not exhibit both these phases, notably models featuring identity shortcuts such as Transformers. This thesis further explores how such models achieve information compression and discusses the need for fitting and compression in various scenarios. Additionally, the study provides an analytical perspective on atypical training behaviours, including ``grokking'', highlighting the intricate dynamics at play in neural network training. Overall, the thesis contributes to the body of work attempting to shed light on the behaviour of state-of-the-art machine learning models. date: 2025-01-28 date_type: published oa_status: green full_text_type: other thesis_class: doctoral_open thesis_award: Ph.D language: eng primo: open primo_central: open_green verified: verified_manual elements_id: 2353791 lyricists_name: Lyu, Zhaoyan lyricists_id: ZLYUX31 actors_name: Lyu, Zhaoyan actors_id: ZLYUX31 actors_role: owner full_text_status: public pagerange: 1-249 pages: 249 institution: UCL (University College London) department: Electronic & Electrical Engineering thesis_type: Doctoral citation: Lyu, Zhaoyan; (2025) On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures. Doctoral thesis (Ph.D), UCL (University College London). Green open access document_url: https://discovery.ucl.ac.uk/id/eprint/10203622/1/Lyu_10203622_Thesis.pdf