Zhu, R;
Guo, Y;
Xue, J-H;
(2020)
Adjusting the imbalance ratio by the dimensionality of imbalanced data.
Pattern Recognition Letters
10.1016/j.patrec.2020.03.004.
Preview |
Text
PRL-RuiZhu-AIR-R1.pdf - Accepted Version Download (458kB) | Preview |
Abstract
Class-imbalance extent metrics measure how imbalanced the data are. In pattern classification, it is usually expected that the higher the imbalance extent, the worse the classification performance, and thus an appropriate imbalance extent metric should show a negative correlation with the classification performance. Existing metrics, such as the popular imbalance ratio (IR), only consider the effect of the sample sizes of different classes. However, we note that the dimensionality of imbalanced data also affects the classification performance. Datasets with the same IR can present distinct classification performances when their dimensionalities are different, making IR suboptimal to reflect the imbalance extent for classification. We also observe that the classification performance becomes better with more discriminative features. Inspired by these observations, we propose a new imbalance extent metric, the adjusted IR, by adding a penalty term of the number of discriminative features that is effectively determined by the Pearson correlation test. The adjusted IR adaptively revises the IR when the number of discriminative features varies. The empirical studies demonstrate the effectiveness of the adjusted IR, in terms of its better negative correlation with the classification performance.
Type: | Article |
---|---|
Title: | Adjusting the imbalance ratio by the dimensionality of imbalanced data |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1016/j.patrec.2020.03.004 |
Publisher version: | http://dx.doi.org/10.1016/j.patrec.2020.03.004 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | Imbalanced data, imbalance extent, imbalanced learning, imbalance ratio, Pearson correlation test |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10092923 |
Archive Staff Only
View Item |