UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

Hennig, C; Lin, C-J; (2015) Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters. Statistics and Computing , 25 (4) pp. 821-833. 10.1007/s11222-015-9566-5. Green open access

[thumbnail of Article]
Preview
Text (Article)
Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters.pdf

Download (1MB) | Preview

Abstract

There are two notoriously hard problems in cluster analysis, estimating the number of clusters, and checking whether the population to be clustered is not actually homogeneous. Given a dataset, a clustering method and a cluster validation index, this paper proposes to set up null models that capture structural features of the data that cannot be interpreted as indicating clustering. Artificial datasets are sampled from the null model with parameters estimated from the original dataset. This can be used for testing the null hypothesis of a homogeneous population against a clustering alternative. It can also be used to calibrate the validation index for estimating the number of clusters, by taking into account the expected distribution of the index under the null model for any given number of clusters. The approach is illustrated by three examples, involving various different clustering techniques (partitioning around medoids, hierarchical methods, a Gaussian mixture model), validation indexes (average silhouette width, prediction strength and BIC), and issues such as mixed-type data, temporal and spatial autocorrelation.

Type: Article
Title: Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s11222-015-9566-5
Publisher version: http://dx.doi.org/10.1007/s11222-015-9566-5
Additional information: © The Author(s) 2015. This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the sources are credited.
Keywords: Cluster validation, Mixture model, Distance-based clustering, Markov chain, Mixed-type data, Spatial autocorrelation, Presence-absence data
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/1467288
Downloads since deposit
93Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item