UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Decision trees and forests: a probabilistic perspective

Lakshminarayanan, B; (2016) Decision trees and forests: a probabilistic perspective. Doctoral thesis , UCL (University College London). Green open access

[thumbnail of thesis.pdf]
Preview
Text
thesis.pdf

Download (2MB) | Preview

Abstract

Decision trees and ensembles of decision trees are very popular in machine learning and often achieve state-of-the-art performance on black-box prediction tasks. However, popular variants such as C4.5, CART, boosted trees and random forests lack a probabilistic interpretation since they usually just specify an algorithm for training a model. We take a probabilistic approach where we cast the decision tree structures and the parameters associated with the nodes of a decision tree as a probabilistic model; given labeled examples, we can train the probabilistic model using a variety of approaches (Bayesian learning, maximum likelihood, etc). The probabilistic approach allows us to encode prior assumptions about tree structures and share statistical strength between node parameters; furthermore, it offers a principled mechanism to obtain probabilistic predictions which is crucial for applications where uncertainty quantification is important. Existing work on Bayesian decision trees relies on Markov chain Monte Carlo which can be computationally slow and suffer from poor mixing. We propose a novel sequential Monte Carlo algorithm that computes a particle approximation to the posterior over trees in a top-down fashion. We also propose a novel sampler for Bayesian additive regression trees by combining the above top-down particle filtering algorithm with the Particle Gibbs (Andrieu et al., 2010) framework. Finally, we propose Mondrian forests (MFs), a computationally efficient hybrid solution that is competitive with non-probabilistic counterparts in terms of speed and accuracy, but additionally produces well-calibrated uncertainty estimates. MFs use the Mondrian process (Roy and Teh, 2009) as the randomization mechanism and hierarchically smooth the node parameters within each tree (using a hierarchical probabilistic model and approximate Bayesian updates), but combine the trees in a non-Bayesian fashion. MFs can be grown in an incremental/online fashion and remarkably, the distribution of online MFs is the same as that of batch MFs.

Type: Thesis (Doctoral)
Title: Decision trees and forests: a probabilistic perspective
Event: University College London
Open access status: An open access version is available from UCL Discovery
Language: English
Keywords: Machine learning, decision trees, bayesian, random forests, probabilistic, mondrian process, sequential monte carlo, predictive uncertainty
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Gatsby Computational Neurosci Unit
URI: https://discovery.ucl.ac.uk/id/eprint/1498835
Downloads since deposit
398Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item