UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing

Huang, Kevin Han; Liu, Xing; Duncan, Andrew B; Gandy, Axel; (2023) A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing. In: Neu, Gergely and Rosasco, Lorenzo, (eds.) Proceedings of Thirty Sixth Conference on Learning Theory. (pp. pp. 3827-3918). Proceedings of Machine Learning Research (PMLR): Bangalore, India. Green open access

[thumbnail of Huang_23a.pdf]
Preview
Text
Huang_23a.pdf

Download (3MB) | Preview

Abstract

We prove a convergence theorem for U-statistics of degree two, where the data dimension d is allowed to scale with sample size n . We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite n and d , independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with d and the bandwidth.

Type: Proceedings paper
Title: A High-dimensional Convergence Theorem for U-statistics with Applications to Kernel-based Testing
Event: Thirty Sixth Conference on Learning Theory
Open access status: An open access version is available from UCL Discovery
Publisher version: https://proceedings.mlr.press/v195/huang23a.html
Language: English
Additional information: This is an Open Access paper published under a Creative Commons Attribution 4.0 International (CC BY 4.0) Licence (https://creativecommons.org/licenses/by/4.0/).
Keywords: High-dimensional statistics, U-statistics, distribution testing, kernel method
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Gatsby Computational Neurosci Unit
URI: https://discovery.ucl.ac.uk/id/eprint/10170054
Downloads since deposit
12Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item