UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

The noise component in model-based clustering.

Coretto, P.; (2008) The noise component in model-based clustering. Doctoral thesis , University of London. Green open access

[img] Text
U592537.pdf

Download (4MB)

Abstract

Model-based cluster analysis is a statistical tool used to investigate group-structures in data. Finite mixtures of Gaussian distributions are a popular device used to model elliptical shaped clusters. Estimation of mixtures of Gaussians is usually based on the maximum likelihood method. However, for a wide class of finite mixtures, including Gaussians, maximum likelihood estimates are not robust. This implies that a small proportion of outliers in the data could lead to poor estimates and clustering. One way to deal with this is to add a "noise component", i.e. a mixture component that models the outliers. In this thesis we explore this approach based on three contributions. First, Fraley and Raftery (1993) propose a Gaussian mixture model with the addition of a uniform noise component with support on the data range. We generalize this approach by introducing a model, which is a finite mixture of location-scale distributions mixed with a finite number of uniforms supported on disjoint subsets of the data range. We study identifiability and maximum likelihood estimation, and provide a computational procedure based on the EM algorithm. Second, Hennig (2004) proposed a sort of model in which the noise component is represented by a fixed improper density, which is a constant on the real line. He shows that the resulting estimates are robust to extreme outliers. We define a maximum likelihood type estimator for such a model and study its asymptotic behaviour. We also provide a method for choosing the improper constant density, and a computational procedure based on the EM algorithm. The third contribution is an extensive simulation study in which we measure the performance of the previous two methods and certain other robust method ologies proposed in the literature.

Type: Thesis (Doctoral)
Title: The noise component in model-based clustering.
Identifier: PQ ETD:592537
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by Proquest
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/1445219
Downloads since deposit
65Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item