Lever, G.;
(2011)
Exploiting structure defined by data in machine learning: some new analyses.
Doctoral thesis, UCL (University College London).

PDF
1302070.pdf Download (2MB) 
Abstract
This thesis offers some new analyses and presents some new methods for learning in the context of exploiting structure defined by data – for example, when a data distribution has a submanifold support, exhibits cluster structure or exists as an object such as a graph. 1. We present a new PACBayes analysis of learning in this context, which is sharp and in some ways presents a better solution than uniform convergence methods. The PACBayes prior over a hypothesis class is defined in terms of the unknown true risk and smoothness of hypotheses w.r.t. the unknown datagenerating distribution. The analysis is “localized” in the sense that complexity of the model enters not as the complexity of an entire hypothesis class, but focused on functions of ultimate interest. Such bounds are derived for various algorithms including SVMs. 2. We consider an idea similar to the pnorm Perceptron for building classifiers on graphs. We define pnorms on the space of functions over graph vertices and consider interpolation using the pnorm as a smoothness measure. The method exploits cluster structure and attains a mistake bound logarithmic in the diameter, compared to a linear lower bound for standard methods. 3. Rademacher complexity is related to cluster structure in data, quantifying the notion that when data clusters we can learn well with fewer examples. In particular we relate transductive learning to cluster structure in the empirical resistance metric. 4. Typical methods for learning over a graph do not scale well in the number of data points – often a graph Laplacian must be inverted which becomes computationally intractable for large data sets. We present online algorithms which, by simplifying the graph in principled way, are able to exploit the structure while remaining computationally tractable for large datasets. We prove stateoftheart performance guarantees.
Type:  Thesis (Doctoral) 

Title:  Exploiting structure defined by data in machine learning: some new analyses 
Open access status:  An open access version is available from UCL Discovery 
Language:  English 
UCL classification:  UCL > School of BEAMS > Faculty of Engineering Science > Computer Science 
URI:  http://discovery.ucl.ac.uk/id/eprint/1302070 
Archive Staff Only
View Item 