Jaedeug Choi (최재득)
Jaedeug Choi graduated with PhD degree in Computer Science from the Korea Advanced Institute of Technology (KAIST) in 2013. While actively advancing his career as a postdoctoral scholar at POSTECH, his life was sadly claimed by an accident on October 30, 2013.
He will be greatly missed. We will maintain his homepage to remember him.
I did my Ph.D. under the supervision of Prof. Kee-Eung Kim at the Department of Computer Science, Korea Advanced Institute of Science and Technology (KAIST) in Aug. 2013. I am interested in planning, learning, and decision making under uncertainty. My recent research focuses on inverse reinforcement learning and its applications.
Machine Learning Theory
Reinforcement learning, partially observable Markov decision process (POMDP), inverse reinforcement learning (IRL), Bayesian nonparametrics
Machine Learning Applications
Human behavior understanding, collaborative filtering and recommendation, spoken dialog management
Ph.D. in Department of Computer Science, KAIST, Aug. 2013.
Thesis: Models and Algorithms for Inverse Reinforcement Learning
Supervisor: Prof. Kee-Eung Kim
M.S. in Department of Computer Science, KAIST, Aug. 2009.
Thesis: Inverse Reinforcement Learning in Partially Observable Environments
Supervisor: Prof. Kee-Eung Kim
B.S. in Department of Computer Science and Engineering, POSTECH, Feb. 2007.
We describe our experience with engineering the dialog state tracker for the first Dialog State Tracking Challenge (DSTC). Dialog trackers are one of the essential components of dialog systems which are used to infer the true user goal from the speech processing results. We explain the main parts of our tracker: the observation model, the belief refinement model, and the belief transformation model. We also report experimental results on a number of approaches to the models, and compare the overall performance of our tracker to other submitted trackers. An extended version of this paper is available as a technical report (Kim et al., 2013).
Most of the algorithms for inverse reinforcement learning (IRL) assume that the reward function is a linear function of the pre-defined state and action features. However, it is often difficult to manually specify the set of features that can make the true reward function representable as a linear function. We propose a Bayesian nonparametric approach to identifying useful composite features for learning the reward function. The composite features are assumed to be the logical conjunctions of the predefined atomic features so that we can represent the reward function as a linear function of the composite features. We empirically show that our approach is able to learn composite features that capture important aspects of the reward function on synthetic domains, and predict taxi drivers' behaviour with high accuracy on a real GPS trace dataset.
We present a nonparametric Bayesian approach to inverse reinforcement learning (IRL) for multiple reward functions. Most previous IRL algorithms assume that the behaviour data is obtained from an agent who is optimizing a single reward function, but this assumption is hard to guarantee in practice. Our approach is based on integrating the Dirichlet process mixture model into Bayesian IRL. We provide an efficient Metropolis-Hastings sampling algorithm utilizing the gradient of the posterior to estimate the underlying reward functions, and demonstrate that our approach outperforms previous ones via experiments on a number of problem domains.
최근 무인항공기의 제작 기술이 발전함에 따라, 농업, 재해 관측용 등의 민간 용도 뿐만 아니라 정찰 및 공격 등의 군사적 목적으로 다수의 무인기를 사용하는 다양한 시도가 진행되고 있다. 그러나 다수의 무인기를 사용할 때에 각 무인기를 사람이 직접 제어하는 데에는 어려움이 많으므로, 주어진 목표를 달성하기 위해서 자율적으로 협력하며 효과적인 행동을 수행하는 알고리즘의 개발이 필수적이다. 이러한 문제는 순차적 의사결정 문제로 생각할 수 있으며, 마코프 의사결정 과정(Markov Decision Processes; MDPs)과 이를 부분적 혹은 부정확한 관찰값을 다룰 수 있도록 확장한 부분관찰 마코프 의사결정 과정 (Partially Observable MDPs; POMDPs) 등의 대표적인 의사결정이론 모델을 이용하여 복잡하고 불확실한 환경에서의 의사결정 문제를 통계적으로 다룰 수 있다. 본 논문에서는 복수의 무인기를 이용할 때 동적 임무 할당 및 정찰 임무 문제를 POMDP를 이용하여 효율적으로 최적화할 수 있음을 보이고, 센서의 관찰값에 오차가 발생할 수 있는 경우, MDP에 비해 POMDP를 이용할 때 더 좋은 성능을 얻을 수 있음을 보인다. 또한 실제 쿼드콥터(quadcopter)를 이용하여 POMDP 정책이 실제 환경에서도 잘 동작함을 시뮬레이션을 통해 입증하였다.
The difficulty in inverse reinforcement learning (IRL) arises in choosing the best reward function since there are typically an infinite number of reward functions that yield the given behaviour data as optimal. Using a Bayesian framework, we address this challenge by using the maximum a posteriori (MAP) estimation for the reward function, and show that most of the previous IRL algorithms can be modeled into our framework. We also present a gradient method for the MAP estimation based on the (sub)differentiability of the posterior distribution. We show the effectiveness of our approach by comparing the performance of the proposed method to those of the previous algorithms.
Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behavior of an expert. Most of the existing IRL algorithms assume that the environment is modeled as a Markov decision process (MDP), although it is desirable to handle partially observable settings in order to handle more realistic scenarios. In this paper, we present IRL algorithms for partially observable environments that can be modeled as a partially observable Markov decision process (POMDP). We deal with two cases according to the representation of the given expert's behavior, namely the case in which the expert's policy is explicitly given, and the case in which the expert’s trajectories are available instead. The IRL in POMDPs poses a greater challenge than in MDPs since it is not only ill-posed due to the nature of IRL, but also computationally intractable due to the hardness in solving POMDPs. To overcome these obstacles, we present algorithms that exploit some of the classical results from the POMDP literature. Experimental results on several benchmark POMDP domains show that our work is useful for partially observable settings.
Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behaviour of an expert. Most of the existing algorithms for IRL assume that the expert's environment is modeled as a Markov decision process (MDP), although they should be able to handle partially observable settings in order to widen the applicability to more realistic scenarios. In this paper, we present an extension of the classical IRL algorithm by Ng and Russell to partially observable environments. We discuss technical issues and challenges, and present the experimental results on some of the benchmark partially observable domains.
NIPS 2013, ACML 2013
Institute of Electrical and Electronic Engineers (IEEE)
Korean Institute of Information Scientists and Engineers (KIISE)
IRL tutorial (slides) at Pattern Recognition and Machine Learning Summer School 2013 (hosted by KIISE)
Participated in Dialog State Tracking Challenge (DSTC) 2013
Awards & Scholarships
2012 NIPS travel award
2011 NIPS travel award
2010 Outstanding M.S. thesis award in Department of Computer Science, KAIST
2009 IJCAI travel award
2007 Best oral presentation at Undergraduate Research Program, POSTECH
Department of Computer Science
Korea Advanced Institute of Science and Technology (KAIST)
335 Gwahangno, Yuseong-gu, Daejeon, 305-701, Republic of Korea
Office: E3-1 #2418
E-mail: jdchoi [at] ai [dot] kaist [dot] ac [dot] kr