Tsipinakis, N;
Nelson, JDB;
(2016)
Sparse temporal difference learning via alternating direction method of multipliers.
In: Kurgan, L and Palade, V and Wani, A, (eds.)
Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA 2015).
(pp. pp. 220-225).
Institute of Electrical and Electronics Engineers (IEEE)
Preview |
Text
paper.pdf - Accepted Version Download (426kB) | Preview |
Abstract
Recent work in off-line Reinforcement Learning has focused on efficient algorithms to incorporate feature selection, via 1-regularization, into the Bellman operator fixed-point estimators. These developments now mean that over-fitting can be avoided when the number of samples is small compared to the number of features. However, it remains unclear whether existing algorithms have the ability to offer good approximations for the task of policy evaluation and improvement. In this paper, we propose a new algorithm for approximating the fixed-point based on the Alternating Direction Method of Multipliers (ADMM). We demonstrate, with experimental results, that the proposed algorithm is more stable for policy iteration compared to prior work. Furthermore, we also derive a theoretical result that states the proposed algorithm obtains a solution which satisfies the optimality conditions for the fixed-point problem.
Type: | Proceedings paper |
---|---|
Title: | Sparse temporal difference learning via alternating direction method of multipliers |
Event: | 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 9-11 December 2015, Miami, Florida, USA |
ISBN-13: | 9781509002870 |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1109/ICMLA.2015.36 |
Publisher version: | http://dx.doi.org/10.1109/ICMLA.2015.36 |
Language: | English |
Additional information: | Copyright © 2015 by The Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences |
URI: | https://discovery.ucl.ac.uk/id/eprint/1477560 |
Archive Staff Only
View Item |