UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

An Efficient End-to-End Training Approach for Zero-Shot Human-AI Coordination

Yan, X; Guo, J; Lou, X; Wang, J; Zhang, H; Du, Y; (2023) An Efficient End-to-End Training Approach for Zero-Shot Human-AI Coordination. In: Advances in Neural Information Processing Systems 36 (NeurIPS 2023). NeurIPS Proceedings: New Orleans, LA, USA. Green open access

[thumbnail of NeurIPS-2023-an-efficient-end-to-end-training-approach-for-zero-shot-human-ai-coordination-Paper-Conference.pdf]
Preview
Text
NeurIPS-2023-an-efficient-end-to-end-training-approach-for-zero-shot-human-ai-coordination-Paper-Conference.pdf - Published Version

Download (910kB) | Preview

Abstract

The goal of zero-shot human-AI coordination is to develop an agent capable of collaborating with humans without relying on human data. Prevailing two-stage population-based methods require a diverse population of mutually distinct policies to simulate diverse human behaviors. The necessity of such populations severely limits their computational efficiency. To address this issue, we propose E3T, an Efficient End-to-End Training approach for zero-shot human-AI coordination. E3T employs a mixture of ego policy and random policy to construct the partner policy, making it both skilled in coordination and diverse. This way, the ego agent is trained end-to-end with this mixture policy, eliminating the need for a pre-trained population, and thus significantly improving training efficiency. In addition, we introduce a partner modeling module designed to predict the partner's actions based on historical contexts. With the predicted partner's action, the ego policy can adapt its strategy and take actions accordingly when collaborating with humans exhibiting different behavior patterns. Empirical results on the Overcooked environment demonstrate that our method substantially improves the training efficiency while preserving comparable or superior performance than the population-based baselines. Demo videos are available at https://sites.google.com/view/e3t-overcooked.

Type: Proceedings paper
Title: An Efficient End-to-End Training Approach for Zero-Shot Human-AI Coordination
Event: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)
Open access status: An open access version is available from UCL Discovery
Publisher version: https://papers.nips.cc/paper_files/paper/2023/hash...
Language: English
Additional information: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10192033
Downloads since deposit
4Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item