UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

Akhanli, SE; Hennig, C; (2020) Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes. Statistics and Computing , 30 pp. 1523-1544. 10.1007/s11222-020-09958-2. Green open access

[thumbnail of akhanli-hennig-arxiv_revision3.pdf]
Preview
Text
akhanli-hennig-arxiv_revision3.pdf - Accepted Version

Download (1MB) | Preview

Abstract

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.

Type: Article
Title: Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s11222-020-09958-2
Publisher version: https://doi.org/10.1007/s11222-020-09958-2
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Science & Technology, Technology, Physical Sciences, Computer Science, Theory & Methods, Statistics & Probability, Computer Science, Mathematics, Number of clusters, Random clustering, Within-cluster homogeneity, Between-clusters separation, Cluster stability
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/10118927
Downloads since deposit
66Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item