UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Topics in kernal hypothesis testing

Chwialkowski, KP; (2016) Topics in kernal hypothesis testing. Doctoral thesis , UCL (University College London). Green open access

[img]
Preview
Text
ThesisKacperChwialkowski.pdf

Download (1MB) | Preview

Abstract

This thesis investigates some unaddressed problems in kernel nonparametric hypothesis testing. The contributions are grouped around three main themes: Wild Bootstrap for Degenerate Kernel Tests. A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. This bootstrap method is used to construct provably consistent tests that apply to random processes. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and non-degenerate elsewhere. In experiments, the wild bootstrap gives strong performance on synthetic examples, on audio data, and in performance benchmarking for the Gibbs sampler. A Kernel Test of Goodness of Fit. A nonparametric statistical test for goodness-of-fit is proposed: given a set of samples, the test determines how likely it is that these were generated from a target density function. The measure of goodness-of-fit is a divergence constructed via Stein's method using functions from a Reproducing Kernel Hilbert Space. Construction of the test is based on the wild bootstrap method. We apply our test to quantifying convergence of approximate Markov Chain Monte Carlo methods, statistical model criticism, and evaluating quality of fit vs model complexity in nonparametric density estimation. Fast Analytic Functions Based Two Sample Test. A class of nonparametric two-sample tests with a cost linear in the sample size is proposed. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. Experiments on artificial benchmarks and on challenging real-world testing problems demonstrate good power/time tradeoff retained even in high dimensional problems. The main contributions to science are the following. We prove that the kernel tests based on the wild bootstrap method tightly control the type one error on the desired level and are consistent i.e. type two error drops to zero with increasing number of samples. We construct a kernel goodness of fit test that requires only knowledge of the density up to an normalizing constant. We use this test to construct first consistent test for convergence of Markov Chains and use it to quantify properties of approximate MCMC algorithms. Finally, we construct a linear time two-sample test that uses new, finite dimensional feature representation of probability measures.

Type: Thesis (Doctoral)
Title: Topics in kernal hypothesis testing
Event: UCL
Open access status: An open access version is available from UCL Discovery
Language: English
Keywords: Machine learning, statistics
URI: http://discovery.ucl.ac.uk/id/eprint/1519607
Downloads since deposit
211Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item