<> <http://www.w3.org/2000/01/rdf-schema#comment> "The repository administrator has not yet configured an RDF license."^^<http://www.w3.org/2001/XMLSchema#string> . <> <http://xmlns.com/foaf/0.1/primaryTopic> <https://discovery.ucl.ac.uk/id/eprint/10074784> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Article> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/dc/terms/title> "Frustratingly short attention spans in neural language modeling"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/ontology/bibo/abstract> "Neural language models predict the next token using a latent representation of the immediate token history. Recently, various methods for augmenting neural language models with an attention mechanism over a differentiable memory have been proposed. For predicting the next token, these models query information from a memory of the recent history which can facilitate learning mid- and long-range dependencies. However, conventional attention mechanisms used in memory-augmented neural language models produce a single output vector per time step. This vector is used both for predicting the next token as well as for the key and value of a differentiable memory of a token history. In this paper, we propose a neural language model with a key-value attention mechanism that outputs separate representations for the key and value of a differentiable memory, as well as for encoding the next-word distribution. This model outperforms existing memory-augmented neural language models on two corpora. Yet, we found that our method mainly utilizes a memory of the five most recent output representations. This led to the unexpected main finding that a much simpler model based only on the concatenation of recent output representations from previous time steps is on par with more sophisticated memory-augmented neural language models."^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/dc/terms/date> "2019-01-01" . <https://discovery.ucl.ac.uk/id/document/905851> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Document> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/ontology/bibo/volume> "5" . <https://discovery.ucl.ac.uk/id/org/ext-5f5d9599f98396306c81682b987f7f15> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Organization> . <https://discovery.ucl.ac.uk/id/org/ext-5f5d9599f98396306c81682b987f7f15> <http://xmlns.com/foaf/0.1/name> "International Conference on Learning Representations (ICLR)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/dc/terms/publisher> <https://discovery.ucl.ac.uk/id/org/ext-5f5d9599f98396306c81682b987f7f15> . <https://discovery.ucl.ac.uk/id/publication/ext-61609f584a79e8d2b22b09e0739b339a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Collection> . <https://discovery.ucl.ac.uk/id/publication/ext-61609f584a79e8d2b22b09e0739b339a> <http://xmlns.com/foaf/0.1/name> "5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/dc/terms/isPartOf> <https://discovery.ucl.ac.uk/id/publication/ext-61609f584a79e8d2b22b09e0739b339a> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/ontology/bibo/status> <http://purl.org/ontology/bibo/status/published> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-a38b774c3215e7e57b16f31d5fa107ed> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10074784#authors> . <https://discovery.ucl.ac.uk/id/eprint/10074784#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <https://discovery.ucl.ac.uk/id/person/ext-a38b774c3215e7e57b16f31d5fa107ed> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-87230ceccc961f1fbefe3484cb7ec85d> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10074784#authors> . <https://discovery.ucl.ac.uk/id/eprint/10074784#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_2> <https://discovery.ucl.ac.uk/id/person/ext-87230ceccc961f1fbefe3484cb7ec85d> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-c10c4a40e2ad2e219f97be9cd43d85f7> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10074784#authors> . <https://discovery.ucl.ac.uk/id/eprint/10074784#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_3> <https://discovery.ucl.ac.uk/id/person/ext-c10c4a40e2ad2e219f97be9cd43d85f7> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-75fcbdbc2b65be79c55edef84f5180f9> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10074784#authors> . <https://discovery.ucl.ac.uk/id/eprint/10074784#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_4> <https://discovery.ucl.ac.uk/id/person/ext-75fcbdbc2b65be79c55edef84f5180f9> . <https://discovery.ucl.ac.uk/id/person/ext-87230ceccc961f1fbefe3484cb7ec85d> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-87230ceccc961f1fbefe3484cb7ec85d> <http://xmlns.com/foaf/0.1/givenName> "T"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-87230ceccc961f1fbefe3484cb7ec85d> <http://xmlns.com/foaf/0.1/familyName> "Rocktäschel"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-87230ceccc961f1fbefe3484cb7ec85d> <http://xmlns.com/foaf/0.1/name> "T Rocktäschel"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-a38b774c3215e7e57b16f31d5fa107ed> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-a38b774c3215e7e57b16f31d5fa107ed> <http://xmlns.com/foaf/0.1/givenName> "M"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-a38b774c3215e7e57b16f31d5fa107ed> <http://xmlns.com/foaf/0.1/familyName> "Daniluk"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-a38b774c3215e7e57b16f31d5fa107ed> <http://xmlns.com/foaf/0.1/name> "M Daniluk"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-c10c4a40e2ad2e219f97be9cd43d85f7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-c10c4a40e2ad2e219f97be9cd43d85f7> <http://xmlns.com/foaf/0.1/givenName> "J"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-c10c4a40e2ad2e219f97be9cd43d85f7> <http://xmlns.com/foaf/0.1/familyName> "Welbl"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-c10c4a40e2ad2e219f97be9cd43d85f7> <http://xmlns.com/foaf/0.1/name> "J Welbl"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-75fcbdbc2b65be79c55edef84f5180f9> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-75fcbdbc2b65be79c55edef84f5180f9> <http://xmlns.com/foaf/0.1/givenName> "S"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-75fcbdbc2b65be79c55edef84f5180f9> <http://xmlns.com/foaf/0.1/familyName> "Riedel"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-75fcbdbc2b65be79c55edef84f5180f9> <http://xmlns.com/foaf/0.1/name> "S Riedel"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Article> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/ontology/bibo/presentedAt> <https://discovery.ucl.ac.uk/id/event/ext-52fa64041204d302efe2b23821d13216> . <https://discovery.ucl.ac.uk/id/event/ext-52fa64041204d302efe2b23821d13216> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Conference> . <https://discovery.ucl.ac.uk/id/event/ext-52fa64041204d302efe2b23821d13216> <http://purl.org/dc/terms/title> "5th International Conference on Learning Representations (ICLR 2017)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/EPrint> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/ProceedingsSectionEPrint> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/dc/terms/isPartOf> <https://discovery.ucl.ac.uk/id/repository> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905851> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/905851> <http://www.w3.org/2000/01/rdf-schema#label> "Frustratingly short attention spans in neural language modeling (Text)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://purl.org/dc/elements/1.1/hasVersion> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://eprints.org/ontology/hasPublished> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905851> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10074784/1/1702.04521v1.pdf> . <https://discovery.ucl.ac.uk/id/document/905851> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10074784/1/1702.04521v1.pdf> . <https://discovery.ucl.ac.uk/id/eprint/10074784/1/1702.04521v1.pdf> <http://www.w3.org/2000/01/rdf-schema#label> "1702.04521v1.pdf"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/905852> . <https://discovery.ucl.ac.uk/id/document/905852> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/905852> <http://www.w3.org/2000/01/rdf-schema#label> "Frustratingly short attention spans in neural language modeling (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/905852> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905852> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905852> <http://eprints.org/relation/islightboxThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/905853> . <https://discovery.ucl.ac.uk/id/document/905853> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/905853> <http://www.w3.org/2000/01/rdf-schema#label> "Frustratingly short attention spans in neural language modeling (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/905853> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905853> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905853> <http://eprints.org/relation/ispreviewThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/905854> . <https://discovery.ucl.ac.uk/id/document/905854> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/905854> <http://www.w3.org/2000/01/rdf-schema#label> "Frustratingly short attention spans in neural language modeling (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/905854> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905854> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905854> <http://eprints.org/relation/ismediumThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/905855> . <https://discovery.ucl.ac.uk/id/document/905855> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/905855> <http://www.w3.org/2000/01/rdf-schema#label> "Frustratingly short attention spans in neural language modeling (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/905855> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905855> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905855> <http://eprints.org/relation/issmallThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/905856> . <https://discovery.ucl.ac.uk/id/document/905856> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/905856> <http://www.w3.org/2000/01/rdf-schema#label> "Frustratingly short attention spans in neural language modeling (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/905856> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905856> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/document/905856> <http://eprints.org/relation/isIndexCodesVersionOf> <https://discovery.ucl.ac.uk/id/document/905851> . <https://discovery.ucl.ac.uk/id/eprint/10074784> <http://www.w3.org/2000/01/rdf-schema#seeAlso> <https://discovery.ucl.ac.uk/id/eprint/10074784/> . <https://discovery.ucl.ac.uk/id/eprint/10074784/> <http://purl.org/dc/elements/1.1/title> "HTML Summary of #10074784 \n\nFrustratingly short attention spans in neural language modeling\n\n" . <https://discovery.ucl.ac.uk/id/eprint/10074784/> <http://purl.org/dc/elements/1.1/format> "text/html" . <https://discovery.ucl.ac.uk/id/eprint/10074784/> <http://xmlns.com/foaf/0.1/primaryTopic> <https://discovery.ucl.ac.uk/id/eprint/10074784> .