A simulation study to compare robust clustering methods based on mixtures.
ADV DATA ANAL CLASSI
The following mixture model-based clustering methods are compared in a simulation study with one-dimensional data, fixed number of clusters and a focus on outliers and uniform "noise": an ML-estimator (MLE) for Gaussian mixtures, an MLE for a mixture of Gaussians and a uniform distribution (interpreted as "noise component" to catch outliers), an MLE for a mixture of Gaussian distributions where a uniform distribution over the range of the data is fixed (Fraley and Raftery in Comput J 41:578-588, 1998), a pseudo-MLE for a Gaussian mixture with improper fixed constant over the real line to catch "noise" (RIMLE; Hennig in Ann Stat 32(4):1313-1340, 2004), and MLEs for mixtures of t-distributions with and without estimation of the degrees of freedom (McLachlan and Peel in Stat Comput 10(4):339-348, 2000). The RIMLE (using a method to choose the fixed constant first proposed in Coretto, The noise component in model-based clustering. Ph.D thesis, Department of Statistical Science, University College London, 2008) is the best method in some, and acceptable in all, simulation setups, and can therefore be recommended.
|Title:||A simulation study to compare robust clustering methods based on mixtures|
|Keywords:||Model-based clustering, Gaussian mixture, Mixture of t-distributions, Noise component, LOCATION-SCALE MIXTURES, MAXIMUM-LIKELIHOOD, EM ALGORITHM, T-DISTRIBUTION, ESTIMATORS|
|UCL classification:||UCL > School of BEAMS > Faculty of Maths and Physical Sciences
UCL > School of BEAMS > Faculty of Maths and Physical Sciences > Statistical Science
Archive Staff Only