UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Frustratingly short attention spans in neural language modeling

Daniluk, M; Rocktäschel, T; Welbl, J; Riedel, S; (2019) Frustratingly short attention spans in neural language modeling. In: 5th International Conference on Learning Representations (ICLR 2017) - Conference Track. International Conference on Learning Representations (ICLR): Toulon, France. Green open access

[thumbnail of 1702.04521v1.pdf]
Preview
Text
1702.04521v1.pdf - Published Version

Download (461kB) | Preview

Abstract

Neural language models predict the next token using a latent representation of the immediate token history. Recently, various methods for augmenting neural language models with an attention mechanism over a differentiable memory have been proposed. For predicting the next token, these models query information from a memory of the recent history which can facilitate learning mid- and long-range dependencies. However, conventional attention mechanisms used in memory-augmented neural language models produce a single output vector per time step. This vector is used both for predicting the next token as well as for the key and value of a differentiable memory of a token history. In this paper, we propose a neural language model with a key-value attention mechanism that outputs separate representations for the key and value of a differentiable memory, as well as for encoding the next-word distribution. This model outperforms existing memory-augmented neural language models on two corpora. Yet, we found that our method mainly utilizes a memory of the five most recent output representations. This led to the unexpected main finding that a much simpler model based only on the concatenation of recent output representations from previous time steps is on par with more sophisticated memory-augmented neural language models.

Type: Proceedings paper
Title: Frustratingly short attention spans in neural language modeling
Event: 5th International Conference on Learning Representations (ICLR 2017)
Open access status: An open access version is available from UCL Discovery
Publisher version: https://iclr.cc/archive/www/2017.html
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10074784
Downloads since deposit
Loading...
40Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item