eprintid: 10154830
rev_number: 9
eprint_status: archive
userid: 699
dir: disk0/10/15/48/30
datestamp: 2022-09-01 11:50:09
lastmod: 2022-09-01 11:50:09
status_changed: 2022-09-01 11:50:09
type: proceedings_section
metadata_visibility: show
sword_depositor: 699
creators_name: Feng, X
creators_name: Slumbers, O
creators_name: Wan, Z
creators_name: Liu, B
creators_name: McAleer, S
creators_name: Wen, Y
creators_name: Wang, J
creators_name: Yang, Y
title: Neural Auto-Curricula in Two-Player Zero-Sum Games
ispublished: pub
divisions: C05
divisions: F48
divisions: B04
divisions: UCL
note: This version is the version of record. For information on re-use, please refer to the publisher's terms and conditions.
abstract: When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper1, we introduce a novel framework-Neural Auto-Curricula (NAC)-that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.
date: 2021
date_type: published
official_url: https://proceedings.neurips.cc/paper/2021/hash/1cd73be1e256a7405516501e94e892ac-Abstract.html
oa_status: green
full_text_type: pub
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 1962500
isbn_13: 9781713845393
lyricists_name: Wang, Jun
lyricists_id: JWANG00
actors_name: Flynn, Bernadette
actors_id: BFFLY94
actors_role: owner
full_text_status: public
pres_type: paper
publication: Advances in Neural Information Processing Systems
volume: 5
pagerange: 3504-3517
event_title: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)
issn: 1049-5258
book_title: Advances in Neural Information Processing Systems
citation:        Feng, X;    Slumbers, O;    Wan, Z;    Liu, B;    McAleer, S;    Wen, Y;    Wang, J;           Feng, X;  Slumbers, O;  Wan, Z;  Liu, B;  McAleer, S;  Wen, Y;  Wang, J;  Yang, Y;   - view fewer <#>    (2021)    Neural Auto-Curricula in Two-Player Zero-Sum Games.                     In:  Advances in Neural Information Processing Systems.  (pp. pp. 3504-3517).         Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10154830/1/NeurIPS-2021-neural-auto-curricula-in-two-player-zero-sum-games-Paper.pdf