Skip to content. | Skip to navigation

Personal tools
Log in
You are here: Home Papers Monte-Carlo Tree Search for Constrained MDPs

Jongmin Lee, Geon-Hyeong Kim, Pascal Poupart, and Kee-Eung Kim (2018)

Monte-Carlo Tree Search for Constrained MDPs

In: ICML/IJCAI/AAMAS Workshop on Planning and Learning (PAL).

Monte-Carlo Tree Search (MCTS) is the state-of-the-art online planning algorithm for very large MDPs. However, many real-world problems inherently have multiple goals, where multi-objective sequential decision models are more natural. The constrained MDP (CMDP) is such a model that maximizes the reward while constraining the cost. The common solution method for CMDPs is linear programming (LP), which is hardly applicable to large real-world problems. In this paper, we present CCUCT (Cost-Constrained UCT), an online planning algorithm for large constrained MDPs (CMDPs) that leverages the optimization of LP-induced parameters. We show that CCUCT converges to the optimal stochastic action selection in CMDPs and it is able to solve very large CMDPs through experiments on the multi-objective version of an Atari 2600 arcade game.