UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Policy Learning from Tutorial Books via Understanding, Rehearsing and Introspecting

Chen, XH; Wang, Z; Du, Y; Jiang, S; Fang, M; Yu, Y; Wang, J; (2024) Policy Learning from Tutorial Books via Understanding, Rehearsing and Introspecting. In: Globerson, A and Mackey, L and Belgrave, D and Fan, A and Paquet, U and Tomczak, J and Zhang, C, (eds.) Advances in Neural Information Processing Systems 37. Neural Information Processing Systems Foundation, Inc. (NeurIPS): Vancouver, Canada. Green open access

[thumbnail of 7398_Policy_Learning_from_Tuto.pdf]
Preview
Text
7398_Policy_Learning_from_Tuto.pdf - Published Version

Download (5MB) | Preview

Abstract

When humans need to learn a new skill, we can acquire knowledge through written books, including textbooks, tutorials, etc. However, current research for decision-making, like reinforcement learning (RL), has primarily required numerous real interactions with the target environment to learn a skill, while failing to utilize the existing knowledge already summarized in the text. The success of Large Language Models (LLMs) sheds light on utilizing such knowledge behind the books. In this paper, we discuss a new policy learning problem called Policy Learning from tutorial Books (PLfB) upon the shoulders of LLMs' systems, which aims to leverage rich resources such as tutorial books to derive a policy network. Inspired by how humans learn from books, we solve the problem via a three-stage framework: Understanding, Rehearsing, and Introspecting (URI). In particular, it first rehearses decision-making trajectories based on the derived knowledge after understanding the books, then introspects about the imaginary dataset to distill a policy network. We build two benchmarks for PLfB based on Tic-Tac-Toe and Football games. In the experiment, URI's policy achieves a minimum of 44% net winning rate against GPT-based agents without any real data. In the much more complex football game, URI's policy beat the built-in AIs with a 37% winning rate while GPT-based agents can only achieve a 6% winning rate. The project page: plfb-football.github.io.

Type: Proceedings paper
Title: Policy Learning from Tutorial Books via Understanding, Rehearsing and Introspecting
Event: 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
ISBN-13: 9798331314385
Open access status: An open access version is available from UCL Discovery
Publisher version: https://proceedings.neurips.cc/paper_files/paper/2...
Language: English
Additional information: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10207118
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item