Optimal Transport for Offline Imitation Learning

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Optimal Transport for Offline Imitation Learning

Luo, Y; Jiang, Z; Cohen, S; Grefenstette, E; Deisenroth, MP; (2023) Optimal Transport for Offline Imitation Learning. In: Proceedings of the 11th International Conference on Learning Representations, ICLR 2023. OpenReview.net: Kigali, Rwanda. Green open access

Preview

Text
2303.13971v1.pdf - Accepted Version
Download (934kB) | Preview

Abstract

With the advent of large datasets, offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive. In this paper, we introduce Optimal Transport Reward labeling (OTR), an algorithm that assigns rewards to offline trajectories, with a few high-quality demonstrations. OTR's key idea is to use optimal transport to compute an optimal alignment between an unlabeled trajectory in the dataset and an expert demonstration to obtain a similarity measure that can be interpreted as a reward, which can then be used by an offline RL algorithm to learn the policy. OTR is easy to implement and computationally efficient. On D4RL benchmarks, we show that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.

Type:	Proceedings paper
Title:	Optimal Transport for Offline Imitation Learning
Event:	11th International Conference on Learning Representations, ICLR 2023
Open access status:	An open access version is available from UCL Discovery
Publisher version:	https://openreview.net/forum?id=MhuFzFsrfvH
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	offline reinforcement learning, optimal transport, imitation learning
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10195982

Downloads since deposit

14Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item