<> <http://www.w3.org/2000/01/rdf-schema#comment> "The repository administrator has not yet configured an RDF license."^^<http://www.w3.org/2001/XMLSchema#string> . <> <http://xmlns.com/foaf/0.1/primaryTopic> <https://discovery.ucl.ac.uk/id/eprint/10203622> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Thesis> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Article> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://purl.org/dc/terms/title> "On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://purl.org/ontology/bibo/abstract> "Despite the fact that there has been significant progress in machine learning, the optimization and generalization behaviour of neural networks is still poorly understood. In particular, it is largely unknown how deep learning-like models are able to generalize under a given training protocol for a particular dataset. \r\n\r\nThe thesis addresses this challenge by exploring the dynamics of neural network optimization through information bottleneck-like methodologies. Concretely, by adopting a minimal mean-squared error (MMSE) based measure and a conditional entropy based measure, the thesis proposes a novel approach to quantifying the relationship between neural representations, input data, and ground-truth labels. This approach has the advantage that these proposed measures are easier to estimate than mutual information, thereby paving the way to shed light into a neural network optimization and generalization process more reliably. \r\n\r\nThe proposed approach reveals that feed-forward neural networks typically exhibit a two-phase generalization pathway, which include an initial fitting phase followed by a subsequent compression phase, that seem to be critical to bolster model generalization. However, the proposed approach also reveals there are some models that do not exhibit both these phases, notably models featuring identity shortcuts such as Transformers. \r\n\r\nThis thesis further explores how such models achieve information compression and discusses the need for fitting and compression in various scenarios. Additionally, the study provides an analytical perspective on atypical training behaviours, including ``grokking'', highlighting the intricate dynamics at play in neural network training. \r\n\r\nOverall, the thesis contributes to the body of work attempting to shed light on the behaviour of state-of-the-art machine learning models."^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://purl.org/dc/terms/date> "2025-01-28" . <https://discovery.ucl.ac.uk/id/document/1819179> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Document> . <https://discovery.ucl.ac.uk/id/org/ext-a64c3df5861c6582807add1abaadf2af> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Organization> . <https://discovery.ucl.ac.uk/id/org/ext-a64c3df5861c6582807add1abaadf2af> <http://xmlns.com/foaf/0.1/name> "UCL (University College London)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://purl.org/dc/terms/issuer> <https://discovery.ucl.ac.uk/id/org/ext-a64c3df5861c6582807add1abaadf2af> . <https://discovery.ucl.ac.uk/id/org/ext-b4deaa30b5b2a5224ec421d60322a070> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Organization> . <https://discovery.ucl.ac.uk/id/org/ext-b4deaa30b5b2a5224ec421d60322a070> <http://xmlns.com/foaf/0.1/name> "Electronic & Electrical Engineering, UCL (University College London)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/org/ext-b4deaa30b5b2a5224ec421d60322a070> <http://purl.org/dc/terms/isPartOf> <https://discovery.ucl.ac.uk/id/org/ext-a64c3df5861c6582807add1abaadf2af> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://purl.org/dc/terms/issuer> <https://discovery.ucl.ac.uk/id/org/ext-b4deaa30b5b2a5224ec421d60322a070> . <https://discovery.ucl.ac.uk/id/org/ext-a64c3df5861c6582807add1abaadf2af> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/org/ext-b4deaa30b5b2a5224ec421d60322a070> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://purl.org/ontology/bibo/status> <http://purl.org/ontology/bibo/status/unpublished> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-992bf7d76a6dd2c1840b3450f13b4d14> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10203622#authors> . <https://discovery.ucl.ac.uk/id/eprint/10203622#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <https://discovery.ucl.ac.uk/id/person/ext-992bf7d76a6dd2c1840b3450f13b4d14> . <https://discovery.ucl.ac.uk/id/person/ext-992bf7d76a6dd2c1840b3450f13b4d14> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-992bf7d76a6dd2c1840b3450f13b4d14> <http://xmlns.com/foaf/0.1/givenName> "Zhaoyan"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-992bf7d76a6dd2c1840b3450f13b4d14> <http://xmlns.com/foaf/0.1/familyName> "Lyu"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-992bf7d76a6dd2c1840b3450f13b4d14> <http://xmlns.com/foaf/0.1/name> "Zhaoyan Lyu"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/EPrint> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/ThesisEPrint> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://purl.org/dc/terms/isPartOf> <https://discovery.ucl.ac.uk/id/repository> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819179> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1819179> <http://www.w3.org/2000/01/rdf-schema#label> "On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures (Text)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1819179> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10203622/1/Lyu_10203622_Thesis.pdf> . <https://discovery.ucl.ac.uk/id/document/1819179> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10203622/1/Lyu_10203622_Thesis.pdf> . <https://discovery.ucl.ac.uk/id/eprint/10203622/1/Lyu_10203622_Thesis.pdf> <http://www.w3.org/2000/01/rdf-schema#label> "Lyu_10203622_Thesis.pdf"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1819180> . <https://discovery.ucl.ac.uk/id/document/1819180> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1819180> <http://www.w3.org/2000/01/rdf-schema#label> "On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1819180> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819180> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819180> <http://eprints.org/relation/isIndexCodesVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1819181> . <https://discovery.ucl.ac.uk/id/document/1819181> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1819181> <http://www.w3.org/2000/01/rdf-schema#label> "On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1819181> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819181> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819181> <http://eprints.org/relation/islightboxThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1819182> . <https://discovery.ucl.ac.uk/id/document/1819182> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1819182> <http://www.w3.org/2000/01/rdf-schema#label> "On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1819182> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819182> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819182> <http://eprints.org/relation/ispreviewThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1819183> . <https://discovery.ucl.ac.uk/id/document/1819183> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1819183> <http://www.w3.org/2000/01/rdf-schema#label> "On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1819183> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819183> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819183> <http://eprints.org/relation/ismediumThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/1819184> . <https://discovery.ucl.ac.uk/id/document/1819184> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/1819184> <http://www.w3.org/2000/01/rdf-schema#label> "On the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/1819184> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819184> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/document/1819184> <http://eprints.org/relation/issmallThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/1819179> . <https://discovery.ucl.ac.uk/id/eprint/10203622> <http://www.w3.org/2000/01/rdf-schema#seeAlso> <https://discovery.ucl.ac.uk/id/eprint/10203622/> . <https://discovery.ucl.ac.uk/id/eprint/10203622/> <http://purl.org/dc/elements/1.1/title> "HTML Summary of #10203622 \n\nOn the Pathway to State-of-the-art Machine Learning Models Generalization: Exploring the Dynamics of Neural Networks Through Information Bottleneck-Inspired Measures\n\n" . <https://discovery.ucl.ac.uk/id/eprint/10203622/> <http://purl.org/dc/elements/1.1/format> "text/html" . <https://discovery.ucl.ac.uk/id/eprint/10203622/> <http://xmlns.com/foaf/0.1/primaryTopic> <https://discovery.ucl.ac.uk/id/eprint/10203622> .