TY - INPR CY - New York, NY, United States A1 - Hort, M A1 - Zhang, J A1 - Sarro, F A1 - Harman, M N2 - The increasingly wide uptake of Machine Learning (ML) has raised the significance of the problem of tackling bias (i.e., unfairness), making it a primary software engineering concern. In this paper, we introduce Fairea, a model behaviour mutation approach to benchmarking ML bias mitigation methods. We also report on a large-scale empirical study to test the effectiveness of 12 widely-studied bias mitigation methods. Our results reveal that, surprisingly, bias mitigation methods have a poor effectiveness in 49% of the cases. In particular, 15% of the mitigation cases have worse fairness-accuracy trade-offs than the baseline established by Fairea; 34% of the cases have a decrease in accuracy and an increase in bias. Fairea has been made publicly available for software engineers and researchers to evaluate their bias mitigation methods. ID - discovery10130186 PB - Association for Computing Machinery UR - https://doi.org/10.6084/m9.figshare.13712827.v2 N1 - This version is the author accepted manuscript. For information on re-use, please refer to the publisher?s terms and conditions. TI - Fairea: A Model Behaviour Mutation Approach to Benchmarking Bias Mitigation Methods AV - public Y1 - 2021/08/23/ SP - 994 EP - 1006 ER -