UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Adjusting the imbalance ratio by the dimensionality of imbalanced data

Zhu, R; Guo, Y; Xue, J-H; (2020) Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recognition Letters 10.1016/j.patrec.2020.03.004. Green open access

[thumbnail of PRL-RuiZhu-AIR-R1.pdf]
Preview
Text
PRL-RuiZhu-AIR-R1.pdf - Accepted Version

Download (458kB) | Preview

Abstract

Class-imbalance extent metrics measure how imbalanced the data are. In pattern classification, it is usually expected that the higher the imbalance extent, the worse the classification performance, and thus an appropriate imbalance extent metric should show a negative correlation with the classification performance. Existing metrics, such as the popular imbalance ratio (IR), only consider the effect of the sample sizes of different classes. However, we note that the dimensionality of imbalanced data also affects the classification performance. Datasets with the same IR can present distinct classification performances when their dimensionalities are different, making IR suboptimal to reflect the imbalance extent for classification. We also observe that the classification performance becomes better with more discriminative features. Inspired by these observations, we propose a new imbalance extent metric, the adjusted IR, by adding a penalty term of the number of discriminative features that is effectively determined by the Pearson correlation test. The adjusted IR adaptively revises the IR when the number of discriminative features varies. The empirical studies demonstrate the effectiveness of the adjusted IR, in terms of its better negative correlation with the classification performance.

Type: Article
Title: Adjusting the imbalance ratio by the dimensionality of imbalanced data
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.patrec.2020.03.004
Publisher version: http://dx.doi.org/10.1016/j.patrec.2020.03.004
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Imbalanced data, imbalance extent, imbalanced learning, imbalance ratio, Pearson correlation test
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/10092923
Downloads since deposit
928Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item