UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

Ma, W; Mi, Q; Zeng, Y; Yan, X; Wu, Y; Lin, R; Zhang, H; (2024) Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach. In: Globerson, A and Mackey, L and Belgrave, D and Fan, A and Paquet, U and Tomczak, J and Zhang, C, (eds.) Advances in Neural Information Processing Systems 37. Neural Information Processing Systems Foundation, Inc. (NeurIPS): Vancouver, Canada. Green open access

[thumbnail of 9835_Large_Language_Models_Pla.pdf]
Preview
Text
9835_Large_Language_Models_Pla.pdf - Published Version

Download (12MB) | Preview

Abstract

With the continued advancement of Large Language Models (LLMs) Agents in reasoning, planning, and decision-making, benchmarks have become crucial in evaluating these skills. However, there is a notable gap in benchmarks for real-time strategic decision-making. StarCraft II (SC2), with its complex and dynamic nature, serves as an ideal setting for such evaluations. To this end, we have developed TextStarCraft II, a specialized environment for assessing LLMs in real-time strategic scenarios within SC2. Addressing the limitations of traditional Chain of Thought (CoT) methods, we introduce the Chain of Summarization (CoS) method, enhancing LLMs' capabilities in rapid and effective decision-making. Our key experiments included: 1. LLM Evaluation: Tested 10 LLMs in TextStarCraft II, most of them defeating LV5 build-in AI, showcasing effective strategy skills. 2. Commercial Model Knowledge: Evaluated four commercial models on SC2 knowledge; GPT-4 ranked highest by Grandmaster-level experts. 3. Human-AI Matches: Experimental results showed that fine-tuned LLMs performed on par with Gold-level players in real-time matches, demonstrating comparable strategic abilities. All code and data from this study have been made pulicly available at https://github.com/histmeisah/Large-Language-Models-play-StarCraftII.

Type: Proceedings paper
Title: Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach
Event: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
Open access status: An open access version is available from UCL Discovery
Publisher version: https://papers.nips.cc/paper_files/paper/2024/hash...
Language: English
Additional information: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10207115
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item