UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Behavior policy learning: Learning multi-stage tasks via solution sketches and model-based controllers

Tsinganos, K; Chatzilygeroudis, K; Hadjivelichkov, D; Komninos, T; Dermatas, E; Kanoulas, D; (2022) Behavior policy learning: Learning multi-stage tasks via solution sketches and model-based controllers. Frontiers in Robotics and AI , 9 , Article 974537. 10.3389/frobt.2022.974537. Green open access

[thumbnail of frobt-09-974537.pdf]
Preview
PDF
frobt-09-974537.pdf - Published Version

Download (2MB) | Preview

Abstract

Multi-stage tasks are a challenge for reinforcement learning methods, and require either specific task knowledge (e.g., task segmentation) or big amount of interaction times to be learned. In this paper, we propose Behavior Policy Learning (BPL) that effectively combines 1) only few solution sketches, that is demonstrations without the actions, but only the states, 2) model-based controllers, and 3) simulations to effectively solve multi-stage tasks without strong knowledge about the underlying task. Our main intuition is that solution sketches alone can provide strong data for learning a high-level trajectory by imitation, and model-based controllers can be used to follow this trajectory (we call it behavior) effectively. Finally, we utilize robotic simulations to further improve the policy and make it robust in a Sim2Real style. We evaluate our method in simulation with a robotic manipulator that has to perform two tasks with variations: 1) grasp a box and place it in a basket, and 2) re-place a book on a different level within a bookcase. We also validate the Sim2Real capabilities of our method by performing real-world experiments and realistic simulated experiments where the objects are tracked through an RGB-D camera for the first task.

Type: Article
Title: Behavior policy learning: Learning multi-stage tasks via solution sketches and model-based controllers
Location: Switzerland
Open access status: An open access version is available from UCL Discovery
DOI: 10.3389/frobt.2022.974537
Publisher version: https://doi.org/10.3389/frobt.2022.974537
Language: English
Additional information: © 2022 Tsinganos, Chatzilygeroudis, Hadjivelichkov, Komninos, Dermatas and Kanoulas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Keywords: evolutionary strategies, imitation learning, multi-stage tasks, reinforcement learning, sim2real
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10158852
Downloads since deposit
49Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item