Approximate Newton Methods for Policy Search in Markov Decision Processes

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Approximate Newton Methods for Policy Search in Markov Decision Processes

Furmston, T; Lever, G; Barber, D; (2016) Approximate Newton Methods for Policy Search in Markov Decision Processes. Journal of Machine Learning Research , 17 , Article 227. Green open access

Preview

Text
Barber_15-414.pdf - Published Version
Download (543kB) | Preview

Abstract

Approximate Newton methods are standard optimization tools which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, while alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov decision processes (MDPs). We first analyse the structure of the Hessian of the total expected reward, which is a standard objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton methods for MDPs. Like the Gauss- Newton method for non-linear least squares, these methods drop certain terms in the Hessian. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to affine transformation of the parameter space and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss- Newton algorithm is closely related to both the EM-algorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains.

Type:	Article
Title:	Approximate Newton Methods for Policy Search in Markov Decision Processes
Open access status:	An open access version is available from UCL Discovery
Publisher version:	http://jmlr.org/papers/v17/15-414.html
Language:	English
Additional information:	This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Science & Technology, Technology, Automation & Control Systems, Computer Science, Artificial Intelligence, Computer Science, Markov Decision Processes, Reinforcement Learning, Newton Method, Function Approximation, Actor-Critic Algorithms, Gradient Estimation, Infinite-Horizon, Neural-Networks, Systems
UCL classification:	UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/1564553

Downloads since deposit

31Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item