Group Robust Preference Optimization in Reward-free RLHF

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Group Robust Preference Optimization in Reward-free RLHF

Ramesh, Shyam Sundhar; Hu, Yifan; Chaimalas, Iason; Mehta, Viraj; Sessa, Pier Giuseppe; Ammar, Haitham Bou; Bogunovic, Ilija; (2024) Group Robust Preference Optimization in Reward-free RLHF. In: Advances in Neural Information Processing Systems (NeurIPS 2024). NeurIPS: Vancouver, Canada. Green open access

[thumbnail of Robust_DPO_Neurips_CR_version_4.pdf]

Preview

Text
Robust_DPO_Neurips_CR_version_4.pdf - Accepted Version
Download (1MB) | Preview

Abstract

Adapting large language models (LLMs) for specific tasks usually involves fine-tuning through reinforcement learning with human feedback (RLHF) on preference data. While these data often come from diverse labelers' groups (e.g., different demographics, ethnicities, company teams, etc.), traditional RLHF approaches adopt a "one-size-fits-all" approach, i.e., they indiscriminately assume and optimize a single preference model, thus not being robust to unique characteristics and needs of the various groups. To address this limitation, we propose a novel Group Robust Preference Optimization (GRPO) method to align LLMs to individual groups' preferences robustly. Our approach builds upon reward-free direct preference optimization methods, but unlike previous approaches, it seeks a robust policy which maximizes the worst-case group performance. To achieve this, GRPO adaptively and sequentially weights the importance of different groups, prioritizing groups with worse cumulative loss. We theoretically study the feasibility of GRPO and analyze its convergence for the log-linear policy class. By fine-tuning LLMs with GRPO using diverse group-based global opinion data, we significantly improved performance for the worst-performing groups, reduced loss imbalances across groups, and improved probability accuracies compared to non-robust baselines.

Type:	Proceedings paper
Title:	Group Robust Preference Optimization in Reward-free RLHF
Event:	38th Conference on Neural Information Processing Systems (NeurIPS 2024)
Open access status:	An open access version is available from UCL Discovery
Publisher version:	https://openreview.net/forum?id=PRAsjrmXXK
Language:	English
Additional information:	This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	RLHF, DPO, Robust Alignment
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI:	https://discovery.ucl.ac.uk/id/eprint/10199800

Downloads since deposit

1Download

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item