UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifier

Greco, Salvatore; Zhou, Ke; Capra, Licia; Cerquitelli, Tania; Quercia, Daniele; (2024) NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifier. PACM HCI , 2024 (In press). Green open access

[thumbnail of XNLP_nokia_intern_CSCW_2024___Arxiv.pdf]
Preview
Text
XNLP_nokia_intern_CSCW_2024___Arxiv.pdf - Accepted Version

Download (1MB) | Preview

Abstract

AI regulations are expected to prohibit machine learning models from using sensitive attributes during training. However, the latest Natural Language Processing (NLP) classifiers, which rely on deep learning, operate as black-box systems, complicating the detection and remediation of such misuse. Traditional bias mitigation methods in NLP aim for comparable performance across different groups based on attributes like gender or race but fail to address the underlying issue of reliance on protected attributes. To partly fix that, we introduce NLPGuard, a framework for mitigating the reliance on protected attributes in NLP classifiers. NLPGuard takes an unlabeled dataset, an existing NLP classifier, and its training data as input, producing a modified training dataset that significantly reduces dependence on protected attributes without compromising accuracy. NLPGuard is applied to three classification tasks: identifying toxic language, sentiment analysis, and occupation classification. Our evaluation shows that current NLP classifiers heavily depend on protected attributes, with up to 23% of the most predictive words associated with these attributes. However, NLPGuard effectively reduces this reliance by up to 79%, while slightly improving accuracy

Type: Article
Title: NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifier
Open access status: An open access version is available from UCL Discovery
Publisher version: https://dl.acm.org/journal/pacmhci/
Language: English
Additional information: This version is the author-accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: protected attributes, bias, fairness, natural language processing, toxic language, large language models, crowdsourcing
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10194051
Downloads since deposit
14Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item