<> <http://www.w3.org/2000/01/rdf-schema#comment> "The repository administrator has not yet configured an RDF license."^^<http://www.w3.org/2001/XMLSchema#string> . <> <http://xmlns.com/foaf/0.1/primaryTopic> <https://discovery.ucl.ac.uk/id/eprint/10083563> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Article> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/dc/terms/title> "Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/ontology/bibo/abstract> "Trial-and-error based reinforcement learning\r\n(RL) has seen rapid advancements in recent\r\ntimes, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms require a large number of interactions with the environment. A\r\nlarge number of interactions may be impractical in many real-world applications, such as\r\nrobotics, and many practical systems have to\r\nobey limitations in the form of state space\r\nor control constraints. To reduce the number\r\nof system interactions while simultaneously\r\nhandling constraints, we propose a modelbased RL framework based on probabilistic\r\nModel Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs)\r\nto incorporate model uncertainty into longterm predictions, thereby, reducing the impact of model errors. We then use MPC to\r\nfind a control sequence that minimises the\r\nexpected long-term cost. We provide theoretical guarantees for first-order optimality in\r\nthe GP-based transition models with deterministic approximate inference for long-term\r\nplanning. We demonstrate that our approach\r\ndoes not only achieve state-of-the-art data\r\nefficiency, but also is a principled way for RL\r\nin constrained environments."^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/dc/terms/date> "2018-04-11" . <https://discovery.ucl.ac.uk/id/document/984483> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Document> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/ontology/bibo/volume> "84" . <https://discovery.ucl.ac.uk/id/org/ext-340c7c8aba7ae743d869ca5d3b73cb41> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Organization> . <https://discovery.ucl.ac.uk/id/org/ext-340c7c8aba7ae743d869ca5d3b73cb41> <http://xmlns.com/foaf/0.1/name> "Proceedings of Machine Learning (PMLR)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/dc/terms/publisher> <https://discovery.ucl.ac.uk/id/org/ext-340c7c8aba7ae743d869ca5d3b73cb41> . <https://discovery.ucl.ac.uk/id/publication/ext-54ec8a6920fab3cfa718a743c7a4e4ac> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Collection> . <https://discovery.ucl.ac.uk/id/publication/ext-54ec8a6920fab3cfa718a743c7a4e4ac> <http://xmlns.com/foaf/0.1/name> "AISTATS"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/dc/terms/isPartOf> <https://discovery.ucl.ac.uk/id/publication/ext-54ec8a6920fab3cfa718a743c7a4e4ac> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/ontology/bibo/status> <http://purl.org/ontology/bibo/status/published> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-f1149211dbf0fc4aec5c2a3161064ed7> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10083563#authors> . <https://discovery.ucl.ac.uk/id/eprint/10083563#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <https://discovery.ucl.ac.uk/id/person/ext-f1149211dbf0fc4aec5c2a3161064ed7> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/dc/terms/creator> <https://discovery.ucl.ac.uk/id/person/ext-316eea8bcc7d9be9e66ab02256ba1dce> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/ontology/bibo/authorList> <https://discovery.ucl.ac.uk/id/eprint/10083563#authors> . <https://discovery.ucl.ac.uk/id/eprint/10083563#authors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_2> <https://discovery.ucl.ac.uk/id/person/ext-316eea8bcc7d9be9e66ab02256ba1dce> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://www.loc.gov/loc.terms/relators/EDT> <https://discovery.ucl.ac.uk/id/person/ext-09cf9d08ebd76f4a3a8e1a505254ad42> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/ontology/bibo/editorList> <https://discovery.ucl.ac.uk/id/eprint/10083563#editors> . <https://discovery.ucl.ac.uk/id/eprint/10083563#editors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <https://discovery.ucl.ac.uk/id/person/ext-09cf9d08ebd76f4a3a8e1a505254ad42> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://www.loc.gov/loc.terms/relators/EDT> <https://discovery.ucl.ac.uk/id/person/ext-97ac8fbd36dd8ab3636e431a3b19e864> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/ontology/bibo/editorList> <https://discovery.ucl.ac.uk/id/eprint/10083563#editors> . <https://discovery.ucl.ac.uk/id/eprint/10083563#editors> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_2> <https://discovery.ucl.ac.uk/id/person/ext-97ac8fbd36dd8ab3636e431a3b19e864> . <https://discovery.ucl.ac.uk/id/person/ext-09cf9d08ebd76f4a3a8e1a505254ad42> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-09cf9d08ebd76f4a3a8e1a505254ad42> <http://xmlns.com/foaf/0.1/givenName> "AJ"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-09cf9d08ebd76f4a3a8e1a505254ad42> <http://xmlns.com/foaf/0.1/familyName> "Storkey"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-09cf9d08ebd76f4a3a8e1a505254ad42> <http://xmlns.com/foaf/0.1/name> "AJ Storkey"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-97ac8fbd36dd8ab3636e431a3b19e864> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-97ac8fbd36dd8ab3636e431a3b19e864> <http://xmlns.com/foaf/0.1/givenName> "F"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-97ac8fbd36dd8ab3636e431a3b19e864> <http://xmlns.com/foaf/0.1/familyName> "Pérez-Cruz"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-97ac8fbd36dd8ab3636e431a3b19e864> <http://xmlns.com/foaf/0.1/name> "F Pérez-Cruz"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-316eea8bcc7d9be9e66ab02256ba1dce> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-316eea8bcc7d9be9e66ab02256ba1dce> <http://xmlns.com/foaf/0.1/givenName> "MP"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-316eea8bcc7d9be9e66ab02256ba1dce> <http://xmlns.com/foaf/0.1/familyName> "Deisenroth"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-316eea8bcc7d9be9e66ab02256ba1dce> <http://xmlns.com/foaf/0.1/name> "MP Deisenroth"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-f1149211dbf0fc4aec5c2a3161064ed7> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <https://discovery.ucl.ac.uk/id/person/ext-f1149211dbf0fc4aec5c2a3161064ed7> <http://xmlns.com/foaf/0.1/givenName> "S"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-f1149211dbf0fc4aec5c2a3161064ed7> <http://xmlns.com/foaf/0.1/familyName> "Kamthe"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/person/ext-f1149211dbf0fc4aec5c2a3161064ed7> <http://xmlns.com/foaf/0.1/name> "S Kamthe"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Article> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/ontology/bibo/presentedAt> <https://discovery.ucl.ac.uk/id/event/ext-fd9be6e34204cba7d872f3f1b3ffa1ae> . <https://discovery.ucl.ac.uk/id/event/ext-fd9be6e34204cba7d872f3f1b3ffa1ae> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/ontology/bibo/Conference> . <https://discovery.ucl.ac.uk/id/event/ext-fd9be6e34204cba7d872f3f1b3ffa1ae> <http://purl.org/dc/terms/title> "21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), 9-11 April 2018, Lanzarote, Canary Islands, Spain"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/EPrint> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/ProceedingsSectionEPrint> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/dc/terms/isPartOf> <https://discovery.ucl.ac.uk/id/repository> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984483> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/984483> <http://www.w3.org/2000/01/rdf-schema#label> "Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control (Text)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://purl.org/dc/elements/1.1/hasVersion> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://eprints.org/ontology/hasPublished> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984483> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10083563/1/kamthe18a.pdf> . <https://discovery.ucl.ac.uk/id/document/984483> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10083563/1/kamthe18a.pdf> . <https://discovery.ucl.ac.uk/id/eprint/10083563/1/kamthe18a.pdf> <http://www.w3.org/2000/01/rdf-schema#label> "kamthe18a.pdf"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/984484> . <https://discovery.ucl.ac.uk/id/document/984484> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/984484> <http://www.w3.org/2000/01/rdf-schema#label> "Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/984484> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984484> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984484> <http://eprints.org/relation/islightboxThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984484> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10083563/2/lightbox.jpg> . <https://discovery.ucl.ac.uk/id/document/984484> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10083563/2/lightbox.jpg> . <https://discovery.ucl.ac.uk/id/eprint/10083563/2/lightbox.jpg> <http://www.w3.org/2000/01/rdf-schema#label> "lightbox.jpg"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/984485> . <https://discovery.ucl.ac.uk/id/document/984485> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/984485> <http://www.w3.org/2000/01/rdf-schema#label> "Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/984485> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984485> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984485> <http://eprints.org/relation/ispreviewThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984485> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10083563/3/preview.jpg> . <https://discovery.ucl.ac.uk/id/document/984485> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10083563/3/preview.jpg> . <https://discovery.ucl.ac.uk/id/eprint/10083563/3/preview.jpg> <http://www.w3.org/2000/01/rdf-schema#label> "preview.jpg"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/984486> . <https://discovery.ucl.ac.uk/id/document/984486> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/984486> <http://www.w3.org/2000/01/rdf-schema#label> "Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/984486> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984486> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984486> <http://eprints.org/relation/ismediumThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984486> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10083563/4/medium.jpg> . <https://discovery.ucl.ac.uk/id/document/984486> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10083563/4/medium.jpg> . <https://discovery.ucl.ac.uk/id/eprint/10083563/4/medium.jpg> <http://www.w3.org/2000/01/rdf-schema#label> "medium.jpg"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/984487> . <https://discovery.ucl.ac.uk/id/document/984487> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/984487> <http://www.w3.org/2000/01/rdf-schema#label> "Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/984487> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984487> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984487> <http://eprints.org/relation/issmallThumbnailVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984487> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10083563/5/small.jpg> . <https://discovery.ucl.ac.uk/id/document/984487> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10083563/5/small.jpg> . <https://discovery.ucl.ac.uk/id/eprint/10083563/5/small.jpg> <http://www.w3.org/2000/01/rdf-schema#label> "small.jpg"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://eprints.org/ontology/hasDocument> <https://discovery.ucl.ac.uk/id/document/984488> . <https://discovery.ucl.ac.uk/id/document/984488> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://eprints.org/ontology/Document> . <https://discovery.ucl.ac.uk/id/document/984488> <http://www.w3.org/2000/01/rdf-schema#label> "Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control (Other)"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/document/984488> <http://eprints.org/relation/isVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984488> <http://eprints.org/relation/isVolatileVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984488> <http://eprints.org/relation/isIndexCodesVersionOf> <https://discovery.ucl.ac.uk/id/document/984483> . <https://discovery.ucl.ac.uk/id/document/984488> <http://eprints.org/ontology/hasFile> <https://discovery.ucl.ac.uk/id/eprint/10083563/6/indexcodes.txt> . <https://discovery.ucl.ac.uk/id/document/984488> <http://purl.org/dc/terms/hasPart> <https://discovery.ucl.ac.uk/id/eprint/10083563/6/indexcodes.txt> . <https://discovery.ucl.ac.uk/id/eprint/10083563/6/indexcodes.txt> <http://www.w3.org/2000/01/rdf-schema#label> "indexcodes.txt"^^<http://www.w3.org/2001/XMLSchema#string> . <https://discovery.ucl.ac.uk/id/eprint/10083563> <http://www.w3.org/2000/01/rdf-schema#seeAlso> <https://discovery.ucl.ac.uk/id/eprint/10083563/> . <https://discovery.ucl.ac.uk/id/eprint/10083563/> <http://purl.org/dc/elements/1.1/title> "HTML Summary of #10083563 \n\nData-Efficient Reinforcement Learning with Probabilistic Model Predictive Control\n\n" . <https://discovery.ucl.ac.uk/id/eprint/10083563/> <http://purl.org/dc/elements/1.1/format> "text/html" . <https://discovery.ucl.ac.uk/id/eprint/10083563/> <http://xmlns.com/foaf/0.1/primaryTopic> <https://discovery.ucl.ac.uk/id/eprint/10083563> .