UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Learning Theory for Distribution Regression

Szabo, Zolt´an; Sriperumbudur, Bharath K; Poczos, Barnab´as; Gretton, Arthur; (2016) Learning Theory for Distribution Regression. Journal of Machine Learning Research , 17 Green open access

[thumbnail of 1411.2066.pdf]
Preview
Text
1411.2066.pdf
Available under License : See the attached licence file.

Download (603kB) | Preview

Abstract

We focus on the distribution regression problem: regressing to vector-valued outputs from probability measures. Many important machine learning and statistical tasks fit into this framework, including multi-instance learning, and point estimation problems without analytical solution (such as hyperparameter or entropy estimation). Despite the large number of available heuristics in the literature, the inherent two-stage sampled nature of the problem makes the theoretical analysis quite challenging, since in practice only samples from sampled distributions are observable, and the estimates have to rely on similarities computed between sets of points. To the best of our knowledge, the only existing technique with consistency guarantees for distribution regression requires kernel density estimation as an intermediate step (which often performs poorly in practice), and the domain of the distributions to be compact Euclidean. In this paper, we study a simple, analytically computable, ridge regression-based alternative to distribution regression, where we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs. Our main contribution is to prove that this scheme is consistent in the twostage sampled setup under mild conditions (on separable topological domains enriched with kernels): we present an exact computational-statistical efficiency tradeoff analysis showing that the studied estimator is able to match the one-stage sampled minimax optimal rate. This result answers a 16-year-old open question, establishing the consistency of the classical set kernel Haussler (1999); G¨artner et al. (2002) in regression. We also cover consistency for more recent kernels on distributions, including those due to Christmann and Steinwart (2010).

Type: Article
Title: Learning Theory for Distribution Regression
Open access status: An open access version is available from UCL Discovery
Publisher version: https://jmlr.csail.mit.edu/papers/volume17/14-510/...
Language: English
Additional information: This version is the author-accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions
Keywords: Two-Stage Sampled Distribution Regression, Kernel Ridge Regression, Mean Embedding, Multi-Instance Learning, Minimax Optimality
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Gatsby Computational Neurosci Unit
URI: https://discovery.ucl.ac.uk/id/eprint/1455553
Downloads since deposit
8Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item