Average-Reward Reinforcement Learning Using Insights from Non-Equilibrium Statistical Mechanics
ORAL
Abstract
In reinforcement learning (RL), an agent sequentially interacts with an environment in order to discover optimal behaviors. Most of the established frameworks for RL feature discounting, wherein future returns are discounted relative to current returns. Although this framework has the benefit that it leads to useful bounds and convergence properties, it is not appropriate for problems in which discounting is not a principled approach. To address this issue, we consider an alternative framework based on the average-reward formalism, which optimizes for long-term average returns. Notably, when coupled with entropy regularization, the average-reward formulation establishes a connection with the well-studied problem of free energy minimization in nonequilibrium statistical mechanics (NESM). Utilizing this connection, we show how concepts and tools based on large deviation theory in NESM can be leveraged to develop novel algorithms for solving problems with stochastic dynamics in RL.
* NSF Award No. DMS-1854350NSF Award No. DIIS-2246221
–
Publication: "Entropy regularized reinforcement learning using large deviation theory": Phys. Rev. Research 5, 023085
"Bayesian inference approach for entropy regularized reinforcement learning with stochastic dynamics": PMLR 216:99-109, 2023
Presenters
-
Jacob Adamczyk
University of Massachusetts Boston
Authors
-
Jacob Adamczyk
University of Massachusetts Boston
-
Argenis Arriojas Maldonado
University of Massachusetts Boston
-
Stas Tiomkin
San Jose State University
-
Rahul V Kulkarni
University of Massachusetts Boston