Saffari, Hamidreza;
Shafiei, Mohammadamin;
Zhang, Hezhao;
Harris, Lasana T;
Moosavi, Nafise Sadat;
(2025)
Beyond Hate Speech: NLP’s Challenges and Opportunities in Uncovering Dehumanizing Language.
In: Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet, (eds.)
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing.
(pp. pp. 26965-26980).
Association for Computational Linguistics: Suzhou, China.
Preview |
Text
Saffari_et_al_2025.pdf - Published Version Download (840kB) | Preview |
Abstract
Dehumanization, i.e., denying human qualities to individuals or groups, is a particularly harmful form of hate speech that can normalize violence against marginalized communities. Despite advances in NLP for detecting general hate speech, approaches to identifying dehumanizing language remain limited due to scarce annotated data and the subtle nature of such expressions. In this work, we systematically evaluate four state-of-the-art large language models (LLMs) — Claude, GPT, Mistral, and Qwen — for dehumanization detection.Our results show that only one model—Claude—achieves strong performance (over 80% F1) under an optimized configuration, while others, despite their capabilities, perform only moderately. Performance drops further when distinguishing dehumanization from related hate types such as derogation. We also identify systematic disparities across target groups: models tend to over-predict dehumanization for some identities (e.g., Gay men), while under-identifying it for others (e.g., Refugees). These findings motivate the need for systematic, group-level evaluation when applying pretrained language models to dehumanization detection tasks.
Archive Staff Only
![]() |
View Item |

