Xue, J;
Hall, P;
(2015)
Why Does Rebalancing Class-unbalanced Data Improve AUC for Linear Discriminant Analysis?
IEEE Transactions on Pattern Analysis and Machine Intelligence
, 37
(5)
pp. 1109-1112.
10.1109/TPAMI.2014.2359660.
Preview |
Text
06906278.pdf Download (217kB) | Preview |
Abstract
Many established classifiers fail to identify the minority class when it is much smaller than the majority class. To tackle this problem, researchers often first rebalance the class sizes in the training dataset, through oversampling the minority class or undersampling the majority class, and then use the rebalanced data to train the classifiers. This leads to interesting empirical patterns. In particular, using the rebalanced training data can often improve the area under the receiver operating characteristic curve (AUC) for the original, unbalanced test data. The AUC is a widely-used quantitative measure of classification performance, but the property that it increases with rebalancing has, as yet, no theoretical explanation. In this note, using Gaussian-based linear discriminant analysis (LDA) as the classifier, we demonstrate that, at least for LDA, there is an intrinsic, positive relationship between the rebalancing of class sizes and the improvement of AUC. We show that the largest improvement of AUC is achieved, asymptotically, when the two classes are fully rebalanced to be of equal sizes.
Type: | Article |
---|---|
Title: | Why Does Rebalancing Class-unbalanced Data Improve AUC for Linear Discriminant Analysis? |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1109/TPAMI.2014.2359660 |
Publisher version: | http://dx.doi.org/10.1109/TPAMI.2014.2359660 |
Language: | English |
Additional information: | Copyright © 2014 The Author(s). This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ |
Keywords: | Training, Training data, Covariance matrices, Vectors, Educational institutions, Data mining, Linear discriminant analysis |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/1448839 |
Archive Staff Only
View Item |