UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Group Robust Preference Optimization in Reward-free RLHF

Ramesh, Shyam Sundhar; Hu, Yifan; Chaimalas, Iason; Mehta, Viraj; Sessa, Pier Giuseppe; Ammar, Haitham Bou; Bogunovic, Ilija; (2024) Group Robust Preference Optimization in Reward-free RLHF. In: Advances in Neural Information Processing Systems (NeurIPS 2024). NeurIPS (In press).

[thumbnail of Robust_DPO_Neurips_CR_version_4.pdf] Text
Robust_DPO_Neurips_CR_version_4.pdf - Accepted Version
Access restricted to UCL open access staff until 9 May 2025.

Download (1MB)
Type: Proceedings paper
Title: Group Robust Preference Optimization in Reward-free RLHF
Event: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
Publisher version: https://papers.nips.cc/
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10199800
Downloads since deposit
1Download
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item