Exploiting structure defined by data in machine learning: some new analyses

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Bookmark & Share

Exploiting structure defined by data in machine learning: some new analyses

Lever, G.; (2011) Exploiting structure defined by data in machine learning: some new analyses. Doctoral thesis , UCL (University College London). Green open access

Preview

PDF
1302070.pdf
Download (2MB)

Abstract

This thesis offers some new analyses and presents some new methods for learning in the context of exploiting structure defined by data – for example, when a data distribution has a submanifold support, exhibits cluster structure or exists as an object such as a graph. 1. We present a new PAC-Bayes analysis of learning in this context, which is sharp and in some ways presents a better solution than uniform convergence methods. The PAC-Bayes prior over a hypothesis class is defined in terms of the unknown true risk and smoothness of hypotheses w.r.t. the unknown data-generating distribution. The analysis is “localized” in the sense that complexity of the model enters not as the complexity of an entire hypothesis class, but focused on functions of ultimate interest. Such bounds are derived for various algorithms including SVMs. 2. We consider an idea similar to the p-norm Perceptron for building classifiers on graphs. We define p-norms on the space of functions over graph vertices and consider interpolation using the pnorm as a smoothness measure. The method exploits cluster structure and attains a mistake bound logarithmic in the diameter, compared to a linear lower bound for standard methods. 3. Rademacher complexity is related to cluster structure in data, quantifying the notion that when data clusters we can learn well with fewer examples. In particular we relate transductive learning to cluster structure in the empirical resistance metric. 4. Typical methods for learning over a graph do not scale well in the number of data points – often a graph Laplacian must be inverted which becomes computationally intractable for large data sets. We present online algorithms which, by simplifying the graph in principled way, are able to exploit the structure while remaining computationally tractable for large datasets. We prove state-of-the-art performance guarantees.

Type:	Thesis (Doctoral)
Title:	Exploiting structure defined by data in machine learning: some new analyses
Open access status:	An open access version is available from UCL Discovery
Language:	English
UCL classification:	UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/1302070

Downloads since deposit

291Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item