Bayesian nonparametric clustering based on Dirichlet
Doctoral thesis, UCL (University College London).
Following a review of some traditional methods of clustering, we review the Bayesian nonparametric framework for modelling object attribute differences. We focus on Dirichlet Process (DP) mixture models, in which the observed clusters in any particular data set are not viewed as belonging to a fixed set of clusters but rather as representatives of a latent structure in which clusters belong to one of a potentially infinite number of clusters. As more information about attribute differences is revealed, the number of inferred clusters is allowed to grow. We begin by studying DP mixture models for normal data and show how to adapt one of the most widely used conditional methods for computation to improve sampling efficiency. This scheme is then generalized, followed by an application to discrete data. The DP’s dispersion parameter is a critical parameter controlling the number of clusters. We propose a framework for the specification of the hyperparameters for this parameter, using a percentile based method. This research was motivated by the analysis of product trials at the magazine Which?, where brand attributes are usually assessed on a 5-point preference scale by experts or by a random selection of Which? subscribers. We conclude with a simulation study, where we replicate some of the standard trials at Which? and compare the performance of our DP mixture models against various other popular frequentist and Bayesian multiple comparison routines adapted for clustering.
|Title:||Bayesian nonparametric clustering based on Dirichlet processes|
|Open access status:||An open access version is available from UCL Discovery|
|UCL classification:||UCL > School of BEAMS > Faculty of Maths and Physical Sciences > Statistical Science|
Archive Staff Only