UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Temporal-Difference Reinforcement Learning with Distributed Representations

Kurth-Nelson, Z; Redish, AD; (2009) Temporal-Difference Reinforcement Learning with Distributed Representations. PLOS ONE , 4 (10) , Article e7362. 10.1371/journal.pone.0007362. Green open access

[thumbnail of 1346634.pdf]
Preview
PDF
1346634.pdf

Download (1MB)

Abstract

Formal Correction: This article has been formally corrected to address the following errors. 1.The initials for author A. David Redish appear incorrectly in the Author Contributions. The correct initials are ADR. The Author Contributions should read: "Conceived and designed the experiments: ZKN DDR. Performed the experiments: ZKN ADR Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting "micro-Agents", each of which has a separate discounting factor (gamma). Each mu Agent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (delta) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each mAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments.

Type: Article
Title: Temporal-Difference Reinforcement Learning with Distributed Representations
Open access status: An open access version is available from UCL Discovery
DOI: 10.1371/journal.pone.0007362
Publisher version: http://dx.doi.org/10.1371/journal.pone.0007362
Language: English
Additional information: © 2009 Kurth-Nelson, Redish. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. This work was supported by a fellowship on NSF-IGERT #9870633 (ZKN), by a Sloan Fellowship (ADR), and by NIH DA024080, as well as by a Career Development Award from the University of Minnesota Transdisciplinary Tobacco Use Research Center (TTURC) (ADR). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Keywords: MIDBRAIN DOPAMINE NEURONS, BASAL GANGLIA, NUCLEUS-ACCUMBENS, REWARD-PREDICTION, DECISION-MAKING, DELAYED REWARDS, NEURAL ACTIVITY, FUTURE REWARDS, HUMAN BRAIN, HIPPOCAMPUS
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > UCL Queen Square Institute of Neurology
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > UCL Queen Square Institute of Neurology > Imaging Neuroscience
URI: https://discovery.ucl.ac.uk/id/eprint/1346634
Downloads since deposit
101Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item