Aversive reinforcement learning.
Doctoral thesis, UCL (University College London).
We hypothesise that human aversive learning can be described algorithmically by Reinforcement Learning models. Our first experiment uses a second-order conditioning design to study sequential outcome prediction. We show that aversive prediction errors are expressed robustly in the ventral striatum, supporting the validity of temporal difference algorithms (as in reward learning), and suggesting a putative critical area for appetitive-aversive interactions. With this in mind, the second experiment explores the nature of pain relief, which as expounded in theories of motivational opponency, is rewarding. In a Pavlovian conditioning task with phasic relief of tonic noxious thermal stimulation, we show that both appetitive and aversive prediction errors are co-expressed in anatomically dissociable regions (in a mirror opponent pattern) and that striatal activity appears to reflect integrated appetitive-aversive processing. Next we designed a Pavlovian task in which cues predicted either financial gains, losses, or both, thereby forcing integration of both motivational streams. This showed anatomical dissociation of aversive and appetitive predictions along a posterior-anterior gradient within the striatum, respectively. Lastly, we studied aversive instrumental control (avoidance). We designed a simultaneous pain avoidance and financial reward learning task, in which subjects had to learn independently learn about each, and trade off aversive and appetitive predictions. We show that predictions for both converge on the medial head of caudate nucleus, suggesting that this is a critical site for appetitive-aversive integration in instrumental decision making. We also study also tested whether serotonin (5HT) modulates either phasic or tonic opponency using acute tryptophan depletion. Both behavioural and imaging data confirm the latter, in which it appears to mediate an average reward term, providing an aspiration level against which the benefits of exploration are judged. In summary, our data provide a basic computational and neuroanatomical framework for human aversive learning. We demonstrate the algorithmic and implementational validity of reinforcement learning models for both aversive prediction and control, illustrate the nature and neuroanatomy of appetitive-aversive integration, and discover the critical (and somewhat unexpected) central role for the striatum.
|Title:||Aversive reinforcement learning|
|Open access status:||An open access version is available from UCL Discovery|
|UCL classification:||UCL > School of Life and Medical Sciences > Faculty of Brain Sciences > Institute of Neurology > Imaging Neuroscience|
Archive Staff Only