dc.contributor.author |
Gampa, P. |
|
dc.contributor.author |
Kondamudi, S.S. |
|
dc.contributor.author |
Kailasam, L. |
|
dc.date.accessioned |
2021-01-05T05:22:41Z |
|
dc.date.available |
2021-01-05T05:22:41Z |
|
dc.date.issued |
2019-02 |
|
dc.identifier.issn |
978-172812662-3 |
|
dc.identifier.uri |
http://localhost:8080/xmlui/handle/123456789/1234 |
|
dc.description.abstract |
We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-fold. First,we give a tractable algorithm based on optimistic value iteration for the problem. Next,we give a lower bound on regret of order Ω(T2/3) for any algorithm discretizes the state space, improving the previous regret bound of Ω(T1/2) of Ortner and Ryabko [1] for the same problem. Next,under the assumption that the rewards and transitions are Hölder Continuous we show that the upper bound on the discretization error is const.Ln-α T. Finally, we give some simple experiments to validate our propositions. © 2019 IEEE. |
en_US |
dc.language.iso |
en_US |
en_US |
dc.publisher |
Institute of Electrical and Electronics Engineers Inc. |
en_US |
dc.relation.ispartofseries |
Proceedings - 2019 2nd International Conference on Intelligent Autonomous Systems, ICoIAS 2019; |
|
dc.subject |
Reinforcement Learning |
en_US |
dc.subject |
Markov Decision Process(MDP) |
en_US |
dc.subject |
Regret |
en_US |
dc.subject |
Continuous State Space |
en_US |
dc.subject |
Bonus |
en_US |
dc.subject |
Finite Horizon |
en_US |
dc.title |
A Tractable Algorithm for Finite-Horizon Continuous Reinforcement Learning |
en_US |
dc.type |
Article |
en_US |