UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction

Straw, Isabel; Wu, Honghan; (2022) Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction. BMJ Health & Care Informatics , 29 (1) 10.1136/bmjhci-2021-100457. Green open access

[thumbnail of e100457.full.pdf]
Preview
Text
e100457.full.pdf - Published Version

Download (1MB) | Preview

Abstract

OBJECTIVES: The Indian Liver Patient Dataset (ILPD) is used extensively to create algorithms that predict liver disease. Given the existing research describing demographic inequities in liver disease diagnosis and management, these algorithms require scrutiny for potential biases. We address this overlooked issue by investigating ILPD models for sex bias. METHODS: Following our literature review of ILPD papers, the models reported in existing studies are recreated and then interrogated for bias. We define four experiments, training on sex-unbalanced/balanced data, with and without feature selection. We build random forests (RFs), support vector machines (SVMs), Gaussian Naïve Bayes and logistic regression (LR) classifiers, running experiments 100 times, reporting average results with SD. RESULTS: We reproduce published models achieving accuracies of >70% (LR 71.31% (2.37 SD) - SVM 79.40% (2.50 SD)) and demonstrate a previously unobserved performance disparity. Across all classifiers females suffer from a higher false negative rate (FNR). Presently, RF and LR classifiers are reported as the most effective models, yet in our experiments they demonstrate the greatest FNR disparity (RF; -21.02%; LR; -24.07%). DISCUSSION: We demonstrate a sex disparity that exists in published ILPD classifiers. In practice, the higher FNR for females would manifest as increased rates of missed diagnosis for female patients and a consequent lack of appropriate care. Our study demonstrates that evaluating biases in the initial stages of machine learning can provide insights into inequalities in current clinical practice, reveal pathophysiological differences between the male and females, and can mitigate the digitisation of inequalities into algorithmic systems. CONCLUSION: Our findings are important to medical data scientists, clinicians and policy-makers involved in the implementation medical artificial intelligence systems. An awareness of the potential biases of these systems is essential in preventing the digital exacerbation of healthcare inequalities.

Type: Article
Title: Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1136/bmjhci-2021-100457
Publisher version: http://dx.doi.org/10.1136/bmjhci-2021-100457
Language: English
Additional information: This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Keywords: Artificial intelligence, BMJ Health Informatics, Health Equity, Machine Learning, Public health informatics, Algorithms, Artificial Intelligence, Bayes Theorem, Bias, Delivery of Health Care, Female, Humans, Liver Diseases, Male, Supervised Machine Learning
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
URI: https://discovery.ucl.ac.uk/id/eprint/10148123
Downloads since deposit
146Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item