?url_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rft.title=Neural+Auto-Curricula+in+Two-Player+Zero-Sum+Games&rft.creator=Feng%2C+X&rft.creator=Slumbers%2C+O&rft.creator=Wan%2C+Z&rft.creator=Liu%2C+B&rft.creator=McAleer%2C+S&rft.creator=Wen%2C+Y&rft.creator=Wang%2C+J&rft.creator=Yang%2C+Y&rft.description=When+solving+two-player+zero-sum+games%2C+multi-agent+reinforcement+learning+(MARL)+algorithms+often+create+populations+of+agents+where%2C+at+each+iteration%2C+a+new+agent+is+discovered+as+the+best+response+to+a+mixture+over+the+opponent+population.+Within+such+a+process%2C+the+update+rules+of+%22who+to+compete+with%22+(i.e.%2C+the+opponent+mixture)+and+%22how+to+beat+them%22+(i.e.%2C+finding+best+responses)+are+underpinned+by+manually+developed+game+theoretical+principles+such+as+fictitious+play+and+Double+Oracle.+In+this+paper1%2C+we+introduce+a+novel+framework-Neural+Auto-Curricula+(NAC)-that+leverages+meta-gradient+descent+to+automate+the+discovery+of+the+learning+update+rule+without+explicit+human+design.+Specifically%2C+we+parameterise+the+opponent+selection+module+by+neural+networks+and+the+best-response+module+by+optimisation+subroutines%2C+and+update+their+parameters+solely+via+interaction+with+the+game+engine%2C+where+both+players+aim+to+minimise+their+exploitability.+Surprisingly%2C+even+without+human+design%2C+the+discovered+MARL+algorithms+achieve+competitive+or+even+better+performance+with+the+state-of-the-art+population-based+game+solvers+(e.g.%2C+PSRO)+on+Games+of+Skill%2C+differentiable+Lotto%2C+non-transitive+Mixture+Games%2C+Iterated+Matching+Pennies%2C+and+Kuhn+Poker.+Additionally%2C+we+show+that+NAC+is+able+to+generalise+from+small+games+to+large+games%2C+for+example+training+on+Kuhn+Poker+and+outperforming+PSRO+on+Leduc+Poker.+Our+work+inspires+a+promising+future+direction+to+discover+general+MARL+algorithms+solely+from+data.&rft.date=2021&rft.type=Proceedings+paper&rft.language=eng&rft.source=+++++In%3A++Advances+in+Neural+Information+Processing+Systems.++(pp.+pp.+3504-3517).+++(2021)+++++&rft.format=text&rft.identifier=https%3A%2F%2Fdiscovery.ucl.ac.uk%2Fid%2Feprint%2F10154830%2F1%2FNeurIPS-2021-neural-auto-curricula-in-two-player-zero-sum-games-Paper.pdf&rft.identifier=https%3A%2F%2Fdiscovery.ucl.ac.uk%2Fid%2Feprint%2F10154830%2F&rft.rights=open