Huang, Kevin Han;
(2025)
Universality beyond the classical asymptotic regime.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Huang_10212720_Thesis.pdf Download (4MB) | Preview |
Abstract
A typical learning problem involves training an estimator f(X1, ..., X_n) on some data set X1, ..., Xn. Gaussian universality is the observation that, for many potentially complicated estimators, properties of the estimator are preserved if the training data are substituted by appropriately chosen Gaussian distributions. This unlocks a wide range of empirical and theoretical tools for analysing the trained estimator, since Gaussian distributions are both analytically tractable and computationally fast to simulate. Universality results have been observed in statistical physics, random matrix theory and other branches of probability; in recent papers, they have been theoretically and/or empirically established for several high-dimensional models across statistics and machine learning (ML). One crucial question is the extent to which universality may hold under high dimensionality and dependence. To address this, this thesis develops Gaussian universality results for a general class of estimators of high-dimensional data, with nearly matching upper and lower bounds. The results cover any f well-approximated by strictly monotone functions of polynomials, whose degree grows not too fast with respect to the sample size n. No explicit requirements are imposed on the number of data dimensions with respect to n. Together with the fourth moment phenomenon of Nualart and Peccati (2005), our results imply necessary and sufficient conditions for the asymptotic normality of approximately polynomial estimators. The remainder of this thesis focuses on how universality results can recover, extend and establish new high-dimensional analyses across statistics and machine learning. These include: (i) a complete distributional characterisation of high-dimensional U-statistics used for kernel-based testing via a moment ratio; (ii) a high-dimensional delta method; (iii) a finite-sample approximation of subgraph count statistics that recover known geometric conditions; (iv) characterising the unexpected effects of dependence under the popular ML practice of data augmentation; (v) analysis of optimisation algorithms found in ML and AI for Science.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Universality beyond the classical asymptotic regime |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Gatsby Computational Neurosci Unit |
URI: | https://discovery.ucl.ac.uk/id/eprint/10212720 |
Archive Staff Only
![]() |
View Item |