UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Structures for Sophisticated Behaviour: Feudal Hierarchies and World Models

Ahilan, Sanjeevan; (2021) Structures for Sophisticated Behaviour: Feudal Hierarchies and World Models. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Ahilan_10127984_Thesis.pdf]
Preview
Text
Ahilan_10127984_Thesis.pdf

Download (6MB) | Preview

Abstract

This thesis explores structured, reward-based behaviour in artificial agents and in animals. In Part I we investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from the hierarchical organisation of human societies, we propose the framework of Feudal Multi-agent Hierarchies (FMH), in which coordination of many agents is facilitated by a manager agent. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We show that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use shared rewards. We next investigate training FMH in simulation to solve a complex information gathering task. Our approach introduces a ‘Centralised Policy Actor-Critic’ (CPAC) and an alteration to the conventional multi-agent policy gradient, which allows one multi-agent system to advise the training of another. We further exploit this idea for communicating agents with shared rewards and demonstrate its efficacy. In Part II we examine how animals discover and exploit underlying statistical structure in their environments, even when such structure is difficult to learn and use. By analysing behavioural data from an extended experiment with rats, we show that such hidden structure can indeed be learned, but also that subjects suffer from imperfections in their ability to infer their current state. We account for their behaviour using a Hidden Markov Model, in which recent observations are integrated imperfectly with evidence from the past. We find that over the course of training, subjects learn to track their progress through the task more accurately, a change that our model largely attributes to the more reliable integration of past evidence

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Structures for Sophisticated Behaviour: Feudal Hierarchies and World Models
Event: UCL (University College London)
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2021. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Life Sciences > Gatsby Computational Neurosci Unit
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10127984
Downloads since deposit
47Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item