UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Median-Based Classifiers for High-Dimensional Data

Hall, P; Titterington, DM; Xue, JH; (2009) Median-Based Classifiers for High-Dimensional Data. J AM STAT ASSOC , 104 (488) 1597 - 1608. 10.1198/jasa.2009.tm08107.

Full text not available from this repository.

Abstract

Conventional distance-based classifiers use standard Euclidean distance, and so can suffer from excessive volatility if vector components have heavy-tailed distributions. This difficulty can be alleviated by replacing the L-2 distance by its L-1 counterpart. For example, the L-1 version of the popular centroid classifier would allocate a new data value to the population to whose centroid it was closest in L-1 terms. However, this approach can lead to inconsistency, because the centroid is defined using L-2, rather than L-1, distance. In particular, by mixing L-1 and L-2 approaches, we produce a classifier that can seriously misidentify data in cases where the means and medians of marginal distributions take different values. These difficulties motivate replacing centroids by medians. However, in the very-high-dimensional settings commonly encountered today, this can be problematic if we attempt to work with a conventional spatial median. Therefore, we suggest using componentwise medians to construct a robust classifier that is relatively insensitive to the difficulties caused by heavy-tailed data and entails straightforward computation. We also consider generalizations and extensions of this approach based on, for example, using data truncation to achieve additional robustness. Using both empirical and theoretical arguments, we explore the properties of these methods, and show that the resulting classifiers can be particularly effective. Supplementary materials are available online.

Type: Article
Title: Median-Based Classifiers for High-Dimensional Data
DOI: 10.1198/jasa.2009.tm08107
Keywords: Centroid classifier, Componentwise median, Data depth, Distance-based classifier, High-dimensional data, L-1 method, Robust method, Sample median, Spatial median, Strength of dependence, GENE-EXPRESSION, DATA DEPTH, MICROARRAY DATA, DISCRIMINANT-ANALYSIS, GENERALIZATION ERROR, SHRUNKEN CENTROIDS, CLASS PREDICTION, CLASSIFICATION, CANCER, SELECTION
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: http://discovery.ucl.ac.uk/id/eprint/72379
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item