KAIST AIPR Lab Artificial Intelligence & Probablistic Reasoning Lab

Publications

2024

Haanvid Lee, Tri Wahyu Guntara, Jongmin Lee, Yung-Kyun Noh, Kee-Eung Kim: Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies. International Conference on Learning Representations (ICLR) (to appear). 2024. Spotlight [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We consider off-policy evaluation (OPE) of deterministic target policies for reinforcement learning (RL) in environments with continuous action spaces. While it is common to use importance sampling for OPE, it suffers from high variance when the behavior policy deviates significantly from the target policy. In order to address this issue, some recent works on OPE proposed in-sample learning with importance resampling. Yet, these approaches are not applicable to deterministic target policies for continuous action spaces. To address this limitation, we propose to relax the deterministic target policy using a kernel and learn the kernel metrics that minimize the overall mean squared error of the estimated temporal difference update vector of an action value function, where the action value function is used for policy evaluation. We derive the bias and variance of the estimation error due to this relaxation and provide analytic solutions for the optimal kernel metric. In empirical studies using various test domains, we show that the OPE with in-sample learning using the kernel with optimized metric achieves significantly improved accuracy than other baselines.
2023

์ตœ์œค์„ , ๋ฐ˜์„ฑํ˜„, ๊น€๊ธฐ์‘: ํ•ด์„ ๊ฐ€๋Šฅํ•œ ํ”„๋กฌํ”„ํŠธ ์ตœ์ ํ™”์— ๊ด€ํ•œ ๊ฐ•ํ™”ํ•™์Šต ์—ฐ๊ตฌ. ํ•œ๊ตญ์†Œํ”„ํŠธ์›จ์–ด์ข…ํ•ฉํ•™์ˆ ๋Œ€ํšŒ(KSC) 2023, ํ•œ๊ตญ์ •๋ณด๊ณผํ•™ํšŒ. 2023. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์€ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ํƒœ์Šคํฌ์— ํšจ๊ณผ์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ์ž ์žฌ๋ ฅ์„ ์ง€๋‹ˆ๊ณ  ์žˆ์œผ๋‚˜, ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋Œ€๊ทœ๋ชจ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํŠน์ • ํ•˜์œ„ ์ž‘์—…์— ๋งž๊ฒŒ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๊ธฐ์กด์˜ ํŒŒ์ธํŠœ๋‹์€ ๋Œ€๋Ÿ‰์˜ ์ปดํ“จํŒ… ์ž์›์„ ์š”๊ตฌํ•œ๋‹ค. ์ด์— ๋ชจ๋“  ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋Œ€์‹  ๊ธฐ์กด ์ž…๋ ฅ์— ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์ด๋ฅผ ํ•™์Šตํ•˜๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ํ”„๋กฌํ”„ํŒ… ๋ฐฉ๋ฒ•๋ก ๋“ค์ด ์ œ์‹œ๋˜์–ด ์™”์œผ๋‚˜, ํ•™์Šต๋œ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‚ฌ๋žŒ์ด ๋ณด์•˜์„ ๋•Œ ํ•ด์„ํ•˜๋Š” ๊ฒƒ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๊ฑฐ๋‚˜ ์นœ์ˆ™ํ•˜์ง€ ์•Š๋‹ค๋Š” ๊ณตํ†ต์ ์ธ ํ•œ๊ณ„์ ์ด ์žˆ์—ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ํ•œ๊ณ„์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ๊ฐ•ํ™”ํ•™์Šต ๋ถ„์•ผ์˜ ํ–‰๋™ ๋ชจ์‚ฌ ํ•™์Šต์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ๋žŒ์—๊ฒŒ ์นœ์ˆ™ํ•œ ์˜ˆ์ œ ํ”„๋กฌ ํ”„ํŠธ๋ฅผ ํ™œ์šฉํ•˜๋Š” ํ•ด์„ ๊ฐ€๋Šฅํ•œ ํ”„๋กฌํ”„ํŠธ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค.

Haeju Lee*, Minchan Jeong*, Se-Young Yun, and Kee-Eung Kim: Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning. Findings of Empirical Methods in Natural Language Processing (EMNLP). 2023. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Prompt tuning, in which prompts are optimized to adapt large-scale pre-trained language models to downstream tasks instead of fine-tuning the full model parameters, has been shown to be particularly effective when the prompts are trained in the multi-task transfer learning setting. These methods generally involve individually training prompts for each source task and then aggregating them to provide the initialization of the prompt for the target task. However, this approach critically ignores the fact that some of the source tasks could be negatively or positively interfering with each other. We argue that when we extract knowledge from source tasks via training source prompts, we need to consider this correlation among source tasks for better transfer to target tasks. To this end, we propose a Bayesian approach where we work with the posterior distribution of prompts across source tasks. We obtain representative source prompts corresponding to the samples from the posterior utilizing Stein Variational Gradient Descent, which are then aggregated to constitute the initial target prompt. We show extensive experimental results on the standard benchmark NLP tasks, where our Bayesian multi-task transfer learning approach outperforms the state-of-the-art methods in many settings. Furthermore, our approach requires no auxiliary models other than the prompt itself, achieving high degree of parameter-efficiency.

Seokin Seo, HyeongJoo Hwang, Hongseok Yang, and Kee-Eung Kim: Regularized Behavior Cloning for Blocking the Leakage of Past Action Information. Advances in Neural Information Processing Systems (NeurIPS). 2023. Spotlight [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

For partially observable environments, imitation learning with observation histories (ILOH) assumes that control-relevant information is sufficiently captured in the observation histories for imitating the expert actions. In the offline setting wherethe agent is required to learn to imitate without interaction with the environment, behavior cloning (BC) has been shown to be a simple yet effective method for imitation learning. However, when the information about the actions executed in the past timesteps leaks into the observation histories, ILOH via BC often ends up imitating its own past actions. In this paper, we address this catastrophic failure by proposing a principled regularization for BC, which we name Past Action Leakage Regularization (PALR). The main idea behind our approach is to leverage the classical notion of conditional independence to mitigate the leakage. We compare different instances of our framework with natural choices of conditional independence metric and its estimator. The result of our comparison advocates the use of a particular kernel-based estimator for the conditional independence metric. We conduct an extensive set of experiments on benchmark datasets in order to assess the effectiveness of our regularization method. The experimental results show that our method significantly outperforms prior related approaches, highlighting its potential to successfully imitate expert actions when the past action information leaks into the observation histories.

Daiki E. Matsunaga*, Jongmin Lee*, Jaeseok Yoon, Stefanos Leonardos, Pieter Abbeel, and Kee-Eung Kim: AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation. Advances in Neural Information Processing Systems (NeurIPS). 2023. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy. This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation. This challenge is amplified in the offline Multi-Agent RL (MARL) setting since the joint action space grows exponentially with the number of agents.To avoid this curse of dimensionality, existing MARL methods adopt either value decomposition methods or fully decentralized training of individual agents. However, even when combined with standard conservatism principles, these methods can still result in the selection of OOD joint actions in offline MARL. To this end, we introduce AlberDICE,an offline MARL algorithm that alternatively performs centralized training of individual agents based on stationary distribution optimization. AlberDICE circumvents the exponential complexity of MARL by computing the best response of one agent at a time while effectively avoiding OOD joint action selection. Theoretically, we show that the alternating optimization procedure converges to Nash policies. In the experiments, we demonstrate that AlberDICE significantly outperforms baseline algorithms on a standard suite of MARL benchmarks.

Jaeseok Yoon*, Seunghyun Hwang*, Ran Han, Jeonguk Bang, and Kee-Eung Kim: Adapting Text-based Dialogue State Tracker for Spoken Dialogues. Special Interest Group on Discourse and Dialogue (SIGDIAL) DSTC11 Workshop. 2023. Track best paper [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Although there have been remarkable advances in dialogue systems through the dialogue systems technology competition (DSTC), it remains one of the key challenges to building a robust task-oriented dialogue system with a speech interface. Most of the progress has been made for text-based dialogue systems since there are abundant datasets with written corpora while those with spoken dialogues are very scarce. However, as can be seen from voice assistant systems such as Siri and Alexa, it is of practical importance to transfer the success to spoken dialogues. In this paper, we describe our engineering effort in building a highly successful model that participated in the speech-aware dialogue systems technology challenge track in DSTC11. Our model consists of three major modules: (1) automatic speech recognition error correction to bridge the gap between the spoken and the text utterances, (2) text-based dialogue system (D3ST) for estimating the slots and values using slot descriptions, and (3) post-processing for recovering the error of the estimated slot value. Our experiments show that it is important to use an explicit automatic speech recognition error correction module, post-processing, and data augmentation to adapt a text-based dialogue state tracker for spoken dialogue corpora.

HyeongJoo Hwang, Seokin Seo, Youngsoo Jang, Sungyoon Kim, Geon-Hyeong Kim, Seunghoon Hong, and Kee-Eung Kim: Information-Theoretic State Space Model for Multi-View Reinforcement Learning. Proceedings of International Conference on Machine Learning (ICML). 2023. Oral presentation [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

Multi-View Reinforcement Learning (MVRL) seeks to find an optimal control for an agent given multi-view observations from various sources. Despite recent advances in multi-view learning that aim to extract the latent representation from multi-view data, it is not straightforward to apply them to control tasks, especially when the observations are temporally dependent on one another. The problem can be even more challenging if the observations are intermittently missing for a subset of views. In this paper, we introduce Fuse2Control (F2C), an information-theoretic approach to capturing the underlying state space model from the sequences of multi-view observations. We conduct an extensive set of experiments in various control tasks showing that our method is highly effective in aggregating task-relevant information across many views, that scales linearly with the number of views while retaining robustness to arbitrary missing view scenarios.

Mihye Kim, Jimyung Choi, Jaehyun Kim, Wooyoung Kim, Yeonung Baek, Gisuk Bang, Kwangwoon Son, Yeonman Ryou, and Kee-Eung Kim: Trustworthy Residual Vehicle Value Prediction for Auto Finance. Proceedings of IAAI Technical Track on deployed Highly Innovative Applications of AI. 2023. Innovative application award [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The residual value (RV) of a vehicle refers to its estimated worth at some point in the future. It is a core component in every auto financial product, used to determine the credit lines and the leasing rates. As such, an accurate prediction of RV is critical for the auto finance industry, since it can pose a risk of revenue loss by over-prediction or make the financial product incompetent by under-prediction. Although there are a number of prior studies on training machine learning models on a large amount of used car sales data, we had to cope with real-world operational requirements such as compliance with regulations (i.e. monotonicity of output with respect to a subset of features) and generalization to unseen input (i.e. new and rare car models). In this paper, we describe how we coped with these practical challenges and created value for our business at Hyundai Capital Services, the top auto financial service provider in Korea.
2022

์„œ์„์ธ, ํ™ฉํ˜•์ฃผ, ์–‘ํ™์„, and ๊น€๊ธฐ์‘: ํŠน์ง•์กฐํ•ฉ ๊ต๋ž€์ž ๊ท ํ˜•์„ ํ†ตํ•œ ์ธ๊ณผ์ •๊ทœํ™”๋œ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๊ฐœ์„ . ํ•œ๊ตญ์†Œํ”„ํŠธ์›จ์–ด์ข…ํ•ฉํ•™์ˆ ๋Œ€ํšŒ(KSC) 2022, ํ•œ๊ตญ์ •๋ณด๊ณผํ•™ํšŒ. 2022. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

์•ˆ์ •์  ํ•™์Šต(stable learning)์€ ํ›ˆ๋ จ๋ฐ์ดํ„ฐ์™€ ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ๋ถ„ํฌ๋ณ€ํ™”(distribution shift)์— ๊ฐ•๊ฑดํ•œ ํ•™์Šต์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด์ง„ ๋ถ„๋ฅ˜๋ฌธ์ œ์—์„œ ์ œ์‹œ๋œ ๊ธฐ์กด ์•ˆ์ •์  ํ•™์Šต ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ ์ธ๊ณผ์ •๊ทœํ™”๋œ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€(Causally-Regularized Logistic Regularization; CRLR)๋ฅผ ํฌํ•จํ•˜๋Š” ๋” ์ผ๋ฐ˜์ ์ธ ์ธ๊ณผ์ •๊ทœํ™” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•˜๊ณ ์ž ํ•œ๋‹ค. ๊ธฐ์กด ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฒฝ์šฐ ๊ฐ ํŠน์ง•(feature) ํ•˜๋‚˜๋ฅผ ์ฒ˜๋ฐฉ๋ณ€์ˆ˜๋กœ ์ทจ๊ธ‰ํ•œ ํ›„ ๋‚˜๋จธ์ง€ ํŠน์ง•์— ๋Œ€ํ•ด ๊ต๋ž€์ž ๊ท ํ˜•(confounder balancing) ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ์ƒ˜ํ”Œ๋“ค์˜ ๊ฐ€์ค‘์น˜๋ฅผ ํ•™์Šตํ•˜์˜€๋Š”๋ฐ, ์ด๋ฅผ ํ™•์žฅํ•˜์—ฌ ํŠน์ง•์˜ ์กฐํ•ฉ์„ ์ฒ˜๋ฐฉ๋ณ€์ˆ˜๋กœ ์ทจ๊ธ‰ํ•œ ๋’ค ๋‚˜๋จธ์ง€ ํŠน์ง•๋“ค์˜ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ํšจ๊ณผ์ ์ž„์„ ๊ฐ„๋‹จํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ๋ณด์ธ๋‹ค.

Geon-Hyeong Kim*, Jongmin Lee*, Youngsoo Jang, Hongseok Yang, and Kee-Eung Kim: LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation. Advances in Neural Information Processing Systems (NeurIPS). 2022. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

We consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities. This offline setting for LfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky.In this paper, we present LobsDICE, an offline LfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two statetransition distributions induced by the expert and the agent policy. Through an extensive set of offline LfO tasks, we show that LobsDICE outperforms strong baseline methods.

Haanvid Lee, Jongmin Lee, Yunseon Choi, Wonseok Jeon, Byung-Jun Lee, Yung-Kyun Noh, and Kee-Eung Kim: Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions. Advances in Neural Information Processing Systems (NeurIPS). 2022. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces. Our work is motivated by practical scenarios where the target policy needs to be deterministic due to domain requirements, such as prescription of treatment dosage and duration in medicine. Although importance sampling (IS) provides a basic principle for OPE, it is ill-posed for the deterministic target policy with continuous actions. Our main idea is to relax the target policy and pose the problem as kernel-based estimation, where we learn the kernel metric in order to minimize the overall mean squared error (MSE). We present an analytic solution for the optimal metric, based on the analysis of bias and variance. Whereas prior work has been limited to scalar action spaces or kernel bandwidth optimization, our work takes a step further being capable of vector action spaces and metric optimization. We show that our estimator is consistent, and significantly reduces the MSE compared to baseline OPE methods through experiments on various domains.

Sanghoon Myung, In Huh, Wonik Jang, Jae Myung Choe, Jisu Ryu, Daesin Kim, Kee-Eung Kim, and Changwook Jeong: PAC-Net: A Model Pruning Approach to Inductive Transfer Learning. Proceedings of International Conference on Machine Learning (ICML). 2022. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Inductive transfer learning aims to learn from a small amount of training data for the target task by utilizing a pre-trained model from the source task. Most strategies that involve large-scale deep learning models adopt initialization with the pre-trained model and fine-tuning for the target task. However, when using over-parameterized models, we can often prune the model without sacrificing the accuracy of the source task. This motivates us to adopt model pruning for transfer learning with deep learning models. In this paper, we propose PAC-Net, a simple yet effective approach for transfer learning based on pruning. PAC-Net consists of three steps: Prune, Allocate, and Calibrate (PAC). The main idea behind these steps is to identify essential weights for the source task, fine-tune on the source task by updating the essential weights, and then calibrate on the target task by updating the remaining redundant weights. Under the various and extensive set of inductive transfer learning experiments, we show that our method achieves state-of-the-art performance by a large margin.

Jinhyeon Kim and Kee-Eung Kim: Data Augmentation for Learning to Play in Text-Based Games. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2022. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Improving generalization in text-based games serves as a useful stepping-stone towards reinforcement learning (RL) agents with generic linguistic ability. Data augmentation for generalization in RL has shown to be very successful in classic control and visual tasks, but there is no prior work for text-based games. We propose Transition-Matching Permutation, a novel data augmentation technique for text-based games, where we identify phrase permutations that match as many transitions in the trajectory data. We show that applying this technique results in state-of-the-art performance in the Cooking Game benchmark suite for text-based games.

Haeju Lee*, Oh Joon Kwon*, Yunseon Choi*, Minho Park, Ran Han, Yoonhyung Kim, Jinhyeon Kim, Youngjune Lee, Haebin Shin, Kangwook Lee, and Kee-Eung Kim: Learning to Embed Multi-Modal Contexts for Situated Conversational Agents. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) Findings. 2022. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The Situated Interactive Multi-Modal Conversations (SIMMC) 2.0 aims to create virtual shopping assistants that can accept complex multi-modal inputs, i.e. visual appearances of objects and user utterances. It consists of four subtasks, multi-modal disambiguation (MM-Disamb), multi-modal coreference resolution (MM-Coref), multi-modal dialog state tracking (MM-DST), and response retrieval and generation. While many task-oriented dialog systems usually tackle each subtask separately, we propose a jointly learned multi-modal encoder-decoder that incorporates visual inputs and performs all four subtasks at once for efficiency. This approach won the MM-Coref and response retrieval subtasks and was nominated runner-up for the remaining subtasks using a single unified model at the 10th Dialog Systems Technology Challenge (DSTC10), setting a high bar for the novel task of multi-modal task-oriented dialog systems.

Haeju Lee*, Oh Joon Kwon*, Yunseon Choi*, Jinhyeon Kim, Youngjune Lee, Ran Han, Yoonhyung Kim, Minho Park, Kangwook Lee, Haebin Shin, and Kee-Eung Kim: Tackling Situated Multi-Modal Task-Oriented Dialogs with a Single Transformer Model. AAAI Conference on Artificial Intelligence (AAAI) DSTC10 Workshop. 2022. Track best paper [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The Situated Interactive Multi-Modal Conversations (SIMMC) 2.0 track in the Dialog System Technology Challenge 10 (DSTC10) aims to create virtual shopping assistants that can accept complex multi-modal inputs, i.e. visual appearances of objects and user utterances. It consists of four subtasks, multi-modal disambiguation (MM-Disamb), multi-modal coreference resolution (MM-Coref), multi-modal dialog state tracking (MM-DST), and response retrieval and generation. While many task-oriented dialog systems usually tackle each subtask separately, we propose a jointly learned encoder-decoder that performs all four subtasks at once for efficiency. Moreover, we handle the multi-modality of the challenge by representing visual objects as special tokens whose joint embedding is learned via auxiliary tasks. Finally, we won in the MM-Coref and response retrieval subtasks and nominated runner-up for the remaining subtasks using a single unified model. In particular, our model achieved 81.5% MRR, 71.2% R@1, 95.0% R@5, 98.2% R@10, and 1.9 mean rank in response retrieval task along with competitive results in all subtasks, setting a high bar for the state-of-the-art result in SIMMC 2.0.

Sunghoon Hong, Deunsol Yoon, and Kee-Eung Kim: Structure-Aware Transformer Policy for Inhomogeneous Multi-Task Reinforcement Learning. International Conference on Learning Representations (ICLR). 2022. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Modular Reinforcement Learning, where the agent is assumed to be morphologically structured as a graph, for example composed of limbs and joints, aims to learn a policy that is transferable to a structurally similar but different agent. Compared to traditional Multi-Task Reinforcement Learning, this promising approach allows us to cope with inhomogeneous tasks where the state and action space dimensions differ across tasks. Graph Neural Networks are a natural model for representing the pertinent policies, but a recent work has shown that their multi-hop message passing mechanism is not ideal for conveying important information to other modules and thus a transformer model without morphological information was proposed. In this work, we argue that the morphological information is still very useful and propose a transformer policy model that effectively encodes such information. Specifically, we encode the morphological information in terms of the traversal-based positional embedding and the graph-based relational embedding. We empirically show that the morphological information is crucial for modular reinforcement learning, substantially outperforming prior state-of-the-art methods on multi-task learning as well as transfer learning settings with different state and action space dimensions.

Youngsoo Jang, Jongmin Lee, and Kee-Eung Kim: GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems. International Conference on Learning Representations (ICLR). 2022. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

Training a task-oriented dialogue agent can be naturally formulated as offline reinforcement learning (RL) problem, where the agent aims to learn a conversational strategy to achieve user goals, only from a dialogue corpus. It is very challenging in terms of RL since the natural language action space is astronomical, while feasible (syntactically and semantically correct) actions are very sparse. Thus, standard RL methods easily fail and generate responses diverging from human language, even when fine-tuning a powerful pre-trained language model. In this paper, we introduce GPT-Critic, an offline RL method for task-oriented dialogue. GPT-Critic is built upon GPT-2, fine-tuning the language model through behavior cloning of the critic-guided self-generated sentences. GPT-Critic is essentially free from the issue of diverging from human language since it learns from the sentences sampled from the pre-trained language model. In the experiments, we demonstrate that our algorithm outperforms the state-of-the-art in the task-oriented dialogue benchmarks including MultiWOZ 2.0 and ConvLab.

Geon-Hyeong Kim, Seokin Seo, Jongmin Lee, Wonseok Jeon, HyeongJoo Hwang, Hongseok Yang, and Kee-Eung Kim: DemoDICE: Offline Imitation Learning with Supplementary Imperfect Demonstrations. International Conference on Learning Representations (ICLR). 2022. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

We consider offline imitation learning (IL), which aims to mimic the expert's behavior from its demonstration without further interaction with the environment. One of the main challenges in offline IL is to deal with the narrow support of the data distribution exhibited by the expert demonstrations that cover only a small fraction of the state and the action spaces. As a result, offline IL algorithms that rely only on expert demonstrations are very unstable since the situation easily deviates from those in the expert demonstrations. In this paper, we assume additional demonstration data of unknown degrees of optimality, which we call imperfect demonstrations. Compared with the recent IL algorithms that adopt adversarial minimax training objectives, we substantially stabilize overall learning process by reducing minimax optimization to a direct convex optimization in a principled manner. Using extensive tasks, we show that DemoDICE achieves promising results in the offline IL from expert and imperfect demonstrations.

Jongmin Lee, Cosmin Paduraru, Daniel J. Mankowitz, Nicolas Heess, Doina Precup, Kee-Eung Kim, and Arthur Guez: COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation. International Conference on Learning Representations (ICLR). 2022. Spotlight [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset. This problem setting is appealing in many real-world scenarios, where direct interaction with the environment is costly or risky, and where the resulting policy should comply with safety constraints. However, it is challenging to compute a policy that guarantees satisfying the cost constraints in the offline RL setting, since the off-policy evaluation inherently has an estimation error. In this paper, we present an offline constrained RL algorithm that optimizes the policy in the space of the stationary distribution. Our algorithm, COptiDICE, directly estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound, with the goal of yielding a cost-conservative policy for actual constraint satisfaction. Experimental results show that COptiDICE attains better policies in terms of constraint satisfaction and return-maximization, outperforming baseline algorithms.
2021

HyeongJoo Hwang, Geon-Hyeong Kim, Seunghoon Hong, and Kee-Eung Kim: Multi-View Representation Learning via Total Correlation Objective. Advances in Neural Information Processing Systems (NeurIPS). 2021. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Multi-View Representation Learning (MVRL) aims to discover a shared representation of observations from different views with the complex underlying correlation. In this paper, we propose a variational approach which casts MVRL as maximizing the amount of total correlation reduced by the representation, aiming to learn a shared latent representation that is informative yet succinct to capture the correlation among multiple views. To this end, we introduce a tractable surrogate objective function under the proposed framework, which allows our method to fuse and calibrate the observations in the representation space. From the information theoretic perspective, we show that our framework subsumes existing multi-view generative models. Lastly, we show that our approach straightforwardly extends to the Partial MVRL (PMVRL) setting, where the observations are missing without any regular pattern. We demonstrate the effectiveness of our approach in the multi-view translation and classification tasks, outperforming strong baseline methods.

๊น€๊ฑดํ˜•, ์žฅ์˜์ˆ˜, ์ด์ข…๋ฏผ, and ๊น€๊ธฐ์‘: ํšจ์œจ์ ์ธ ๋‹ค์ค‘ํƒœ์Šคํฌ ์˜คํ”„๋ผ์ธ ๋ชจ๋ธ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•œ ์—ฐ๊ตฌ. ํ•œ๊ตญ์†Œํ”„ํŠธ์›จ์–ด์ข…ํ•ฉํ•™์ˆ ๋Œ€ํšŒ, ํ•œ๊ตญ์ •๋ณด๊ณผํ•™ํšŒ. 2021. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

์˜คํ”„๋ผ์ธ ๊ฐ•ํ™”ํ•™์Šต์€ ์‚ฌ์ „์— ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํ™˜๊ฒฝ๊ณผ์˜ ์ถ”๊ฐ€์ ์ธ ์ƒํ˜ธ์ž‘์šฉ ์—†์ด ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Ÿฌํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ๋Š” ๋‹จ์ผ ํƒœ์Šคํฌ์—์„œ ์‚ฌ์ „์— ๋งŽ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์ˆ˜์ง‘๋˜์–ด ์žˆ์–ด์•ผ ํ•œ๋‹ค๋Š” ์ œ์•ฝ์ด ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ํƒœ์Šคํฌ์—์„œ ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋“ค์„ ํ™œ์šฉํ•˜๋Š” ๋‹ค์ค‘ํƒœ์Šคํฌ ์˜คํ”„๋ผ์ธ ๊ฐ•ํ™”ํ•™์Šต ๋ฌธ์ œ๋ฅผ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋‹ค์ค‘ํƒœ์Šคํฌ ์˜คํ”„๋ผ์ธ ๊ฐ•ํ™”ํ•™์Šต ๋ฌธ์ œ์—์„œ ๋‹ค๋ฅธ ํƒœ์Šคํฌ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ๋ชจ๋ธ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ MT-OMBRL์€ ํƒœ์Šคํฌ ์‚ฌ์ด์˜ ๋™์—ญํ•™ ์ •๋ณด๋ฅผ ๊ณต์œ ํ•˜์—ฌ ๊ฐ ํƒœ์Šคํฌ๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ์˜คํ”„๋ผ์ธ ๊ฐ•ํ™”ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋Œ€๋น„ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.

Youngjune Lee, Oh Joon Kwon, Haeju Lee, Joonyoung Kim, Kangwook Lee, and Kee-Eung Kim: Augment & Valuate : A Data Enhancement Pipeline for Data-Centric AI. Neural Information Processing Systems (NeurIPS) Data-Centric AI workshop. 2021. Honorable mention [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Data scarcity and noise are important issues in industrial applications of machine learning. However, it is often challenging to devise a scalable and generalized approach to address the fundamental distributional and semantic properties of dataset with black box models. For this reason, data-centric approaches are crucial for the automation of machine learning operation pipeline. In order to serve as the basis for this automation, we suggest a domain-agnostic pipeline for refining the quality of data in image classification problems. This pipeline contains data valuation, cleansing, and augmentation. With an appropriate combination of these methods, we could achieve 84.711% test accuracy (ranked #6, Honorable Mention in the Most Innovative) in the Data-Centric AI competition only with the provided dataset.

Youngjune Lee and Kee-Eung Kim: Dual Correction Strategy for Ranking Distillation in Top-N Recommender System. Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM). 2021. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) to a small model (student), has become an important area of research for practical deployment of recommender systems. Recently, Relaxed Ranking Distillation (RRD) has shown that distilling the ranking information in the recommendation list significantly improves the performance. However, the method still has limitations in that 1) it does not fully utilize the prediction errors of the student model, which makes the training not fully efficient, and 2) it only distills the user-side ranking information, which provides an insufficient view under the sparse implicit feedback. This paper presents Dual Correction strategy for Distillation (DCD), which transfers the ranking information from the teacher model to the student model in a more efficient manner. Most importantly, DCD uses the discrepancy between the teacher model and the student model predictions to decide which knowledge to be distilled. By doing so, DCD essentially provides the learning guidance tailored to "correcting" what the student model has failed to accurately predict. This process is applied for transferring the ranking information from the user-side as well as the item-side to address sparse implicit user feedback. Our experiments show that the proposed method outperforms the state-of-the-art baselines, and ablation studies validate the effectiveness of each component.

Jongmin Lee*, Wonseok Jeon*, Byung-Jun Lee, Joelle Pineau, and Kee-Eung Kim: OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation. Proceedings of the International Conference on Machine Learning (ICML). 2021. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of difficulty, which arises from the deviation of the target policy being optimized from the behavior policy used for data collection. This typically causes overestimation of action values, which poses severe problems for model-free algorithms that use bootstrapping. To mitigate the problem, prior offline RL algorithms often used sophisticated techniques that encourage underestimation of action values, which introduces an additional set of hyperparameters that need to be tuned properly. In this paper, we present an offline RL algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, tightly integrates the optimization of the target policy and the stationary distribution ratio estimation of the target policy and the behavior policy. Using an extensive set of benchmark datasets for offline RL, we show that OptiDICE performs competitively with the state-of-the-art methods.

Jongmin Lee*, Wonseok Jeon*, Byung-Jun Lee, Joelle Pineau, and Kee-Eung Kim: OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation. A Roadmap to Never-Ending RL Workshop at ICLR. 2021. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We consider the offline reinforcement learning (RL) setting where the agent aims to optimize the policy solely from the data without further environment interactions. In offline RL, the distributional shift becomes the primary source of difficulty, which arises from the deviation of the target policy being optimized from the behavior policy used for data collection. This typically causes overestimation of action values, which poses severe problems for model-free algorithms that use bootstrapping. To mitigate the problem, prior offline RL algorithms often used sophisticated techniques that encourage underestimation of action values, which introduces an additional set of hyperparameters that need to be tuned properly. In this paper, we present an offline RL algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, tightly integrates the optimization of the target policy and the stationary distribution ratio estimation of the target policy and the behavior policy. Using an extensive set of benchmark datasets for offline RL, we show that OptiDICE performs competitively with the state-of-the-art methods.

Jinhyeon Kim, Donghoon Ham, Jeong-Gwan Lee, and Kee-Eung Kim: End-to-End Document-Grounded Conversation with Encoder-Decoder Pre-Trained Language Model. AAAI Conference on Artificial Intelligence (AAAI) DSTC9 Workshop. 2021. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The first track of the Ninth Dialog System Technology Challenge (DSTC9), โ€œBeyond Domain APIs: Task-Oriented Conversational Modeling with Unstructured Knowledge Access,โ€ encourages the participants to build goal-oriented dialog systems with access to unstructured knowledge, thereby making it possible to handle diverse user inquiries outside the scope of API/DBs. It consists of three sub-tasks: knowledgeseeking turn detection, knowledge selection, and knowledgegrounded response generation. We claim that tackling these sub-tasks separately is neither parameter-efficient nor of better performance. In this paper, we present an end-to-end document-grounded conversation system that utilizes a pretrained language model with an encoder-decoder structure. In the human evaluation, our dialog system achieved the accuracy score of 4.3082 and the appropriateness score of 4.2665, which ranked 9th out of 24 participant teams. Furthermore, we conduct an ablation study and show that the end-to-end encoder-decoder scheme enables more efficient use of parameters in the document-grounded conversation setting.

Deunsol Yoon*, Sunghoon Hong*, Byung-Jun Lee, and Kee-Eung Kim: Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic. International Conference on Learning Representations (ICLR). 2021. Spotlight [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Safe and reliable electricity transmission in power grids is crucial for modern society. It is thus quite natural that there has been a growing interest in the automatic management of power grids, exempli๏ฌed by the Learning to Run a Power Network Challenge (L2RPN), modeling the problem as a reinforcement learning (RL) task. However, it is highly challenging to manage a real-world scale power grid, mostly due to the massive scale of its state and action space. In this paper, we present an off-policy actor-critic approach that effectively tackles the unique challenges in power grid management by RL, adopting the hierarchical policy together with the afterstate representation. Our agent ranked ๏ฌrst in the latest challenge (L2RPN WCCI 2020), being able to avoid disastrous situations while maintaining the highest level of operational ef๏ฌciency in every test scenarios. This paper provides a formal description of the algorithmic aspect of our approach, as well as further experimental studies on diverse power grids.

Youngsoo Jang, Seokin Seo, Jongmin Lee, and Kee-Eung Kim: Monte-Carlo Planning and Learning with Language Action Value Estimates. International Conference on Learning Representations (ICLR). 2021. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Interactive Fiction (IF) games provide a useful testbed for language-based reinforcement learning agents, posing significant challenges of natural language understanding, commonsense reasoning, and non-myopic planning in the combinatorial search space. Agents based on standard planning algorithms struggle to play IF games due to the massive search space of language actions. Thus, language-grounded planning is a key ability of such agents, since inferring the consequence of language action based on semantic understanding can drastically improve search. In this paper, we introduce Monte-Carlo planning with Language Action Value Estimates (MC-LAVE) that combines a Monte-Carlo tree search with language-driven exploration. MC-LAVE invests more search effort into semantically promising language actions using locally optimistic language value estimates, yielding a significant reduction in the effective search space of language actions. We then present a reinforcement learning approach via MC-LAVE, which alternates between MC-LAVE planning and supervised learning of the self-generated language actions. In the experiments, we demonstrate that our method achieves new high scores in various IF games.

Byung-Jun Lee, Jongmin Lee, and Kee-Eung Kim: Representation Balancing Offline Model-based Reinforcement Learning. International Conference on Learning Representations (ICLR). 2021. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

One of the main challenges in offline and off-policy reinforcement learning is to cope with the distribution shift that arises from the mismatch between the target policy and the data collection policy. In this paper, we focus on a model-based approach, particularly on learning the representation for a robust model of the environment under the distribution shift, which has been first studied by Representation Balancing MDP (RepBM). Although this prior work has shown promising results, there are a number of shortcomings that still hinder its applicability to practical tasks. In particular, we address the curse of horizon exhibited by RepBM, rejecting most of the pre-collected data in long-term tasks. We present a new objective for model learning motivated by recent advances in the estimation of stationary distribution corrections. This effectively overcomes the aforementioned limitation of RepBM, as well as naturally extending to continuous action spaces and stochastic policies. We also present an offline model-based policy optimization using this new objective, yielding the state-of-the-art performance in a representative set of benchmark offline RL tasks.
2020

์ด๋ณ‘์ค€, ์ด์ข…๋ฏผ, ์ตœ์œค์„ , ์žฅ์˜์ˆ˜, and ๊น€๊ธฐ์‘: ํšจ์œจ์ ์ธ ํ‰์ƒํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋ชจ๋ธ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต ์ ์šฉ์— ๊ด€ํ•œ ์—ฐ๊ตฌ. ํ•œ๊ตญ์†Œํ”„ํŠธ์›จ์–ด์ข…ํ•ฉํ•™์ˆ ๋Œ€ํšŒ, ํ•œ๊ตญ์ •๋ณด๊ณผํ•™ํšŒ. 2020. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

ํ‰์ƒํ•™์Šต ๋ฌธ์ œ๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋‹ค๋ฅธ ํƒœ์Šคํฌ๋“ค์„ ์—ฐ์†ํ•ด์„œ ํ•™์Šตํ•˜๋Š” ๋ฌธ์ œ๋กœ, ๋ฒ”์šฉ ์ธ๊ณต์ง€๋Šฅ ์—์ด์ „ํŠธ์˜ ์—ฐ๊ตฌ์— ์žˆ์–ด ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์ง€๋„ํ•™์Šต ๋ถ„์•ผ์˜ ์ž˜ ์•Œ๋ ค์ง„ ํ‰์ƒํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜, Efficient lifelong learning algorithm (ELLA: ํšจ์œจ์ ์ธ ํ‰์ƒํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜)์„ ๋ชจ๋ธ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต ๋ถ„์•ผ์— ์ ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ MB-ELRL์€ ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์˜ ํƒœ์Šคํฌ์— ์ ‘๊ทผ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ ํƒœ์Šคํฌ๋“ค์˜ ๋™์—ญํ•™๋“ค ์‚ฌ์ด์˜ ๊ณต์œ  ๊ฐ€๋Šฅํ•œ ์ •๋ณด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šตํ•˜์—ฌ ๊ฐ ํƒœ์Šคํฌ๋ฅผ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์— ๋น„ํ•ด ์›”๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.

HyeongJoo Hwang, Geon-Hyeong Kim, Seunghoon Hong, and Kee-Eung Kim: Variational Interaction Information Maximization for Cross-domain Disentanglement. Advances in Neural Information Processing Systems (NeurIPS). 2020. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Cross-domain disentanglement is the problem of learning representations partitioned into domain-invariant and domain-specific representations, which is a key to successful domain transfer or measuring semantic distance between two domains. Grounded in information theory, we cast the simultaneous learning of domain-invariant and domain-specific representations as a joint objective of multiple information constraints, which does not require adversarial training or gradient reversal layers. We derive a tractable bound of the objective and propose a generative model named Interaction Information Auto-Encoder (IIAE). Our approach reveals insights on the desirable representation for cross-domain disentanglement and its connection to Variational Auto-Encoder (VAE). We demonstrate the validity of our model in learning disentangled representations with the image-to-image translation and the cross-domain retrieval tasks. We further show that our model achieves the state-of-the-art performance in the zero-shot sketch based image retrieval task, even without external knowledge.

Jongmin Lee, Byung-Jun Lee, and Kee-Eung Kim: Reinforcement Learning for Control with Multiple Frequencies. Advances in Neural Information Processing Systems (NeurIPS). 2020. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Many real-world sequential decision problems involve multiple action variables whose control frequencies are different, such that actions take their effects at different periods. While these problems can be formulated with the notion of multiple action persistences in factored-action MDP (FA-MDP), it is non-trivial to solve them efficiently since an action-persistent policy constructed from a stationary policy can be arbitrarily suboptimal, rendering solution methods for the standard FA-MDPs hardly applicable. In this paper, we formalize the problem of multiple control frequencies in RL and provide its efficient solution method. Our proposed method, Action-Persistent Policy Iteration (AP-PI), provides a theoretical guarantee on the convergence to an optimal solution while incurring only a factor of $|\A|$ increase in time complexity during policy improvement step, compared to the standard policy iteration for FA-MDPs. Extending this result, we present Action-Persistent Actor-Critic (AP-AC), a scalable RL algorithm for high-dimensional control tasks. In the experiments, we demonstrate that AP-AC significantly outperforms the baselines on several continuous control tasks and a traffic control simulation, which highlights the effectiveness of our method that directly optimizes the periodic non-stationary policy for tasks with multiple control frequencies.

Geon-Hyeong Kim, Youngsoo Jang, Hongseok Yang, and Kee-Eung Kim: Variational Inference for Sequential Data with Future Likelihood Estimates. Proceedings of the International Conference on Machine Learning (ICML). 2020. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The recent development of flexible and scalable variational inference algorithms has popularized the use of deep probabilistic models in a wide range of applications. However, learning and reasoning about high-dimensional models with non-differentiable densities are still a challenge. For such a model, inference algorithms struggle to estimate the gradients of variational objectives accurately, due to high variance in their estimates. To tackle this challenge, we present a novel variational inference algorithm for sequential data, which performs well even when the density from the model is not differentiable, for instance, due to the use of discrete random variables. The key feature of our algorithm is that it estimates future likelihoods at all time steps. The estimated future likelihoods form the core of our new low-variance gradient estimator. We formally analyze our gradient estimator from the perspective of variational objective, and show the effectiveness of our algorithm with synthetic and real datasets.

Byung-Jun Lee*, Jongmin Lee*, Peter Vrancx, Dongho Kim, and Kee-Eung Kim: Batch Reinforcement Learning with Hyperparameter Gradients. Proceedings of the International Conference on Machine Learning (ICML). 2020. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

We consider the batch reinforcement learning problem where the agent needs to learn only from a fixed batch of data, without further interaction with the environment. In such a scenario, we want to prevent the optimized policy from deviating too much from the data collection policy since the estimation becomes highly unstable otherwise due to the off-policy nature of the problem. However, imposing this requirement too strongly will result in a policy that merely follows the data collection policy. Unlike prior work where this trade-off is controlled by hand-tuned hyperparameters, we propose a novel batch reinforcement learning approach, batch optimization of policy and hyperparameter (BOPAH), that uses a gradient-based optimization of the hyperparameter using held-out data. We show that BOPAH outperforms other batch reinforcement learning algorithms in tabular and continuous control tasks, by finding a good balance to the trade-off between adhering to the data collection policy and pursuing the possible policy improvement.

Donghoon Ham*, Jeong-Gwan Lee*, Youngsoo Jang, and Kee-Eung Kim: End-to-End Neural Pipeline for Goal-Oriented Dialogue System using GPT-2. Annual Conference of the Association for Computational Linguistics (ACL). 2020. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The goal-oriented dialogue system needs to be optimized for tracking the dialogue flow and carrying out an effective conversation under various situations to meet the user goal. The traditional approach to build such a dialogue system is to take a pipelined modular architecture, where its modules are optimized individually. However, such an optimization scheme does not necessarily yield the overall performance improvement of the whole system. On the other hand, end-to-end dialogue systems with monolithic neural architecture are often trained only with input-output utterances, without taking into account the entire annotations available in the corpus. This scheme makes it difficult for goal-oriented dialogues where the system needs to be integrated with external systems or to provide interpretable information about why the system generated a particular response. In this paper, we present an end-to-end neural architecture for dialogue systems that addresses both challenges above. In the human evaluation, our dialogue system achieved the success rate of 68.32%, the language understanding score of 4.149, and the response appropriateness score of 4.287, which ranked the system at the top position in the end-to-end multi-domain dialogue system task in the 8th dialogue systems technology challenge (DSTC8).

Donghoon Ham*, Jeong-Gwan Lee*, Youngsoo Jang, and Kee-Eung Kim: End-to-End Neural Pipeline for Goal-Oriented Dialogue System using GPT-2. AAAI Conference on Artificial Intelligence (AAAI) DSTC8 Workshop. 2020. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The first sub-task in the multi-domain task-completion dialogue challenge track in the 8th dialogue systems technology challenge (DSTC8) requires participants to build an end-to-end dialogue system that is capable of complex multi-domain dialogues. The traditional approach to build such a dialogue system is to take a pipelined architecture, where its modular components are optimized individually. However, such an optimization scheme does not necessarily yield the overall performance improvement of the whole system. On the other hand, most end-to-end dialogue systems with monolithic neural architecture are trained only with input-output utterances, without taking into account the entire annotations available in the corpus. This scheme makes it difficult for goal-oriented dialogues where the system needs to interact with external systems such as database engines or to provide interpretable information about why the system decided to generate a particular response. In this paper, we present an end-to-end neural architecture for dialogue systems that addresses both challenges above. In the official human evaluation, our dialogue system achieved the success rate of 68.32%, the language understanding score of 4.149, and the response appropriateness score of 4.287, which ranked the system at the top position in all performance evaluation criteria.

Byung-Jun Lee, Seunghoon Hong, and Kee-Eung Kim: Residual Neural Processes. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2020. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

A Neural Process (NP) is a map from a set of observed input-output pairs to a predictive distribution over functions, which is designed to mimic other stochastic processes' inference mechanisms. NPs are shown to work effectively in tasks that require complex distributions, where traditional stochastic processes struggle, e.g. image completion tasks. This paper concerns the practical capacity of set function approximators despite their universality. By delving deeper into the relationship between an NP and a Bayesian last layer (BLL), it is possible to see that NPs may struggle in simple examples, which other stochastic processes can easily solve. In this paper, we propose a simple yet effective remedy; the Residual Neural Process (RNP) that leverages traditional BLL for faster training and better prediction. We demonstrate that the RNP shows faster convergence and better performance, both qualitatively and quantitatively.

Youngsoo Jang, Jongmin Lee, and Kee-Eung Kim: Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2020. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We consider a strategic dialogue task, where the ability to infer the other agent's goal is critical to the success of the conversational agent. While this problem can be naturally formulated as Bayesian planning, it is known to be a very difficult problem due to its enormous search space consisting of all possible utterances. In this paper, we introduce an efficient Bayes-adaptive planning algorithm for goal-oriented dialogues, which combines RNN-based dialogue generation and MCTS-based Bayesian planning in a novel way, leading to robust decision-making under the uncertainty of the other agent's goal. We then introduce reinforcement learning for the dialogue agent that uses MCTS as a strong policy improvement operator, casting reinforcement learning as iterative alternation of planning and supervised-learning of self-generated dialogues. In the experiments, we demonstrate that our Bayes-adaptive dialogue planning agent significantly outperforms the state-of-the-art in a negotiation dialogue domain. We also show that reinforcement learning via MCTS further improves end-task performance without diverging from human language.

Jongmin Lee, Wonseok Jeon, Geon-Hyeong Kim, and Kee-Eung Kim: Monte-Carlo Tree Search in Continuous Action Spaces with Value Gradients. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2020. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Monte-Carlo Tree Search (MCTS) is the state-of-the-art online planning algorithm for large problems with discrete action spaces. However, many real-world problems involve continuous action spaces, where MCTS is not as effective as in discrete action spaces. This is mainly due to common practices such as coarse discretization of the entire action space and failure to exploit local smoothness. In this paper, we introduce Value-Gradient UCT (VG-UCT), which combines traditional MCTS with gradient-based optimization of action particles. VG-UCT simultaneously performs a global search via UCT with respect to the finitely sampled set of actions and performs a local improvement via action value gradients. In the experiments, we demonstrate that our approach outperforms existing MCTS methods and other strong baseline algorithms for continuous action spaces.
2019

๊น€๊ฑดํ˜•, ์žฅ์˜์ˆ˜, ์ด์ข…๋ฏผ, and ๊น€๊ธฐ์‘: ๋ชฌํ…Œ ์นด๋ฅผ๋กœ ๋ชฉํ‘œ๋ฅผ ์œ„ํ•œ ๋ถ„์‚ฐ ๊ฐ์†Œ ๋ฐฉ๋ฒ•. ํ•œ๊ตญํ†ต์‹ ํ•™ํšŒ ํ•˜๊ณ„์ข…ํ•ฉํ•™์ˆ ๋ฐœํ‘œํšŒ. 2019. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

๋ณธ ๋…ผ๋ฌธ์€ ์ด์‚ฐ ์ž ์žฌ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋”ฅ ์ƒ์„ฑ ๋ชจ๋ธ์„ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์—ฐ๊ตฌ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ํ•™์Šต ๋ชฉํ‘œ๋ฅผ ํ†ตํ•ด ๋”ฅ ์ƒ์„ฑ ๋ชจ๋ธ์„ ํ•™์Šตํ•  ๋•Œ, ๊ฐ•ํ™”ํ•™์Šต ๊ณผ์˜ ๋Œ€์‘์„ ์ œ์‹œํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๊ธฐ์กด ๊ฐ•ํ™”ํ•™์Šต์—์„œ์˜ ๋‹ค์–‘ํ•œ ๋ถ„์‚ฐ ๊ฐ์†Œ ๋ฐฉ๋ฒ•๋“ค์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์–ด, ์ด๋ฅผ ํ†ตํ•ด ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊พ€ํ•œ๋‹ค.

์ด์ข…๋ฏผ, ๊น€๊ฑดํ˜•, and ๊น€๊ธฐ์‘: ์—ฐ์† ํ–‰๋™๊ณต๊ฐ„์—์„œ์˜ ๋ชฌํ…Œ-์นด๋ฅผ๋กœ ํŠธ๋ฆฌ ํƒ์ƒ‰์— ๊ด€ํ•œ ์—ฐ๊ตฌ. ํ•œ๊ตญํ†ต์‹ ํ•™ํšŒ ํ•˜๊ณ„์ข…ํ•ฉํ•™์ˆ ๋ฐœํ‘œํšŒ. 2019. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

๋ชฌํ…Œ-์นด๋ฅผ๋กœ ํŠธ๋ฆฌ ํƒ์ƒ‰ (Monte-Carlo Tree Search; MCTS)์€ ์˜จ๋ผ์ธ ๊ณ„ํš ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋‹ค์–‘ํ•œ ์ด์‚ฐ ํ–‰๋™๊ณต๊ฐ„ ๋ฌธ์ œ์—์„œ ํฐ ์„ฑ๊ณต์„ ๊ฑฐ๋‘” ๋ฐ” ์žˆ์ง€๋งŒ, ์—ฐ์†ํ–‰๋™๊ณต๊ฐ„์—์„œ๋Š” ์šฐ์„ ํ•˜์—ฌ ๊ณ ๋ ค๋˜๋Š” ๋ฐฉ๋ฒ•๋ก ์€ ์•„๋‹ˆ์—ˆ๋‹ค. ์ด๋Š” ํŠธ๋ฆฌ ํƒ์ƒ‰์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ํ–‰๋™๊ณต๊ฐ„์„ ๊ฑฐ์น ๊ฒŒ ์ด์‚ฐํ™”ํ•˜๋Š” ์ž‘์—…์ด ๋ถˆ๊ฐ€ํ”ผํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฐ์† ํ–‰๋™๊ณต๊ฐ„์—์„œ UCT์˜ ์ „์—ญ์  ํƒ์ƒ‰๊ณผ ํ™˜๊ฒฝ์˜ ๋ฏธ๋ถ„ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•œ ์ง€์—ญ์  ํƒ์ƒ‰์„ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ๋‹ค๋ฃจ๊ณ ์ž ํ•œ๋‹ค. ์—ฐ์† ํ–‰๋™๊ณต๊ฐ„์˜ ์ œ์–ด๋ฌธ์ œ์˜ ๋ฒค์น˜๋งˆํฌ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์ด ๋‹ค์–‘ํ•œ ๋น„๊ต ๋Œ€์ƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์„ฑ๋Šฅ์„ ๋Šฅ๊ฐ€ํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.

Nianyin Zeng, Zidong Wang, Hong Zhang, Kee-Eung Kim, Yurong Li, and Xiaohui Liu: An Improved Particle Filter With a Novel Hybrid Proposal Distribution for Quantitative Analysis of Gold Immunochromatographic Strips. IEEE Transactions on Nanotechnology, 18:819-829. 2019. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

In this paper, a novel statistical pattern recognition method is proposed for accurately segmenting test and control lines from the gold immunochromatographic strip (GICS) images for the benefits of quantitative analysis. A new dynamic state-space model is established, based on which the segmentation task of test and control lines is transformed into a state estimation problem. Especially, the transition equation is utilized to describe the relationship between contour points on the upper and the lower boundaries of test and control lines, and a new observation equation is developed by combining the contrast of between-class variance and the uniformity measure. Then, an innovative particle filter (PF) with a hybrid proposal distribution, namely, deep-belief-network-based particle filter (DBN-PF) is put forward, where the deep belief network (DBN) provides an initial recognition result in the hybrid proposal distribution, and the particle swarm optimization algorithm moves particles to regions of high likelihood. The performance of proposed DBN-PF method is comprehensively evaluated on not only an artificial dataset but also the GICS images in terms of several indices as compared to the PF and DBN methods. It is demonstrated via experiment results that the proposed approach is effective in quantitative analysis of GICS.

Yung-Kyun Noh, Ji Young Park, Byoung Geol Choi, Kee-Eung Kim, and Seung-Woon Rha: A Machine Learning-Based Approach for the Prediction of Acute Coronary Syndrome Requiring Revascularization. Journal of Medical Systems. 2019. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The aim of this study is to predict acute coronary syndrome (ACS) requiring revascularization in those patients presenting early-stage angina-like symptom using machine learning algorithms. We obtained data from 2344 ACS patients, who required revascularization and from 3538 non-ACS patients. We analyzed 20 features that are relevant to ACS using standard algorithms, support vector machines and linear discriminant analysis. Based on feature pattern and filter characteristics, we analyzed and extracted a strong prediction function out of the 20 selected features. The obtained prediction functions are relevant showing the area under curve of 0.860 for the prediction of ACS that requiring revascularization. Some features are missing in many data though they are considered to be very informative; it turned out that omitting those features from the input and using more data without those features for training improves the prediction accuracy. Additionally, from the investigation using the receiver operating characteristic curves, a reliable prediction of 2.60% of non-ACS patients could be made with a specificity of 1.0. For those 2.60% non-ACS patients, we can consider the recommendation of medical treatment without risking misdiagnosis of the patients requiring revascularization. We investigated prediction algorithm to select ACS patients requiring revascularization and non-ACS patients presenting angina-like symptoms at an early stage. In the future, a large cohort study is necessary to increase the prediction accuracy and confirm the possibility of safely discriminating the non-ACS patients from the ACS patients with confidence.

Youngsoo Jang, Jongmin Lee, and Kee-Eung Kim: Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues. Neural Information Processing Systems (NeurIPS) Conversational AI workshop. 2019. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We consider a strategic dialogue task, where the ability to infer the other agent's goal is critical to the success of the conversational agent. While this problem can be naturally formulated as Bayesian planning, it is known to be a very difficult problem due to its enormous search space consisting of all possible utterances. In this paper, we propose an efficient Bayes-adaptive planning algorithm for goal-oriented dialogues, which combines RNN-based dialogue generation and MCTS-based Bayesian planning in a novel way, leading to a robust decision-making under the uncertainty of the other agent's goal. We then introduce reinforcement learning for the dialogue agent that uses MCTS as a strong policy improvement operator, casting reinforcement learning as iterative alternation of planning and supervised-learning of self-generated dialogues. In the experiments, we demonstrate that our Bayes-adaptive dialogue planning agent significantly outperforms the state-of-the-art in a negotiation dialogue domain. We also show that reinforcement learning via MCTS further improves end-task performance without diverging from human language.

Geon-Hyeong Kim, Youngsoo Jang, Jongmin Lee, Wonseok Jeon, Hongseok Yang, and Kee-Eung Kim: Trust Region Sequential Variational Inference. Proceedings of Asian Conference on Machine Learning (ACML). 2019. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Stochastic variational inference has emerged as an effective method for performing inference on or learning complex models for data. Yet, one of the challenges in stochastic variational inference is handling high-dimensional data, such as sequential data, and models with non-differentiable densities caused by, for instance, the use of discrete latent variables. In such cases, it is challenging to control the variance of the gradient estimator used in stochastic variational inference, while low variance is often one of the key properties needed for successful inference. In this work, we present a new algorithm for stochastic variational inference of sequential models which trades off bias for variance to tackle this challenge effectively. Our algorithm is inspired by variance reduction techniques in reinforcement learning, yet it uniquely adopts their key ideas in the context of stochastic variational inference. We demonstrate the effectiveness of our approach through formal analysis and experiments on synthetic and real-world datasets.

Youngsoo Jang*, Jongmin Lee*, Jaeyoung Park*, Kyeng-Hun Lee, Pierre Lison, and Kee-Eung Kim: PyOpenDial: A Python-based Domain-Independent Toolkit for Developing Spoken Dialogue Systems with Probabilistic Rules. Proceedings of Empirical Methods in Natural Language Processing (EMNLP) System Demonstrations. 2019. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We present PyOpenDial, a Python-based domain-independent, open-source toolkit for spoken dialogue systems. Recent advances in core components of dialogue systems, such as speech recognition, language understanding, dialogue management, and language generation, harness deep learning to achieve state-of-the-art performance. The original OpenDial, implemented in Java, provides a plugin architecture to integrate external modules, but lacks Python bindings, making it difficult to interface with popular deep learning frameworks such as Tensorflow or PyTorch. To this end, we re-implemented OpenDial in Python and extended the toolkit with a number of novel functionalities for neural dialogue state tracking and action planning. We describe the overall architecture and its extensions, and illustrate their use on an example where the system response model is implemented with a recurrent neural network.

๊ฐ•๋ฏผ๊ตฌ and ๊น€๊ธฐ์‘: ๊ฐ•ํ™”ํ•™์Šต์„ ์ด์šฉํ•œ ์ดˆ๊ณ ์†๋น„ํ–‰์ฒด ์ œ์–ด๊ธฐ ํ•™์Šต. ํ•œ๊ตญ๊ตฐ์‚ฌ๊ณผํ•™๊ธฐ์ˆ ํ•™ํšŒ ์ข…ํ•ฉํ•™์ˆ ๋Œ€ํšŒ. 2019. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

๋น„ํ–‰์ฒด๊ฐ€ ์ดˆ๊ณ ์† ๋น„ํ–‰์‹œ ๋ฐœ์ƒํ•˜๋Š” ์–‘๋ ฅ (Lift) ๋ฐ ์ €ํ•ญ๋ ฅ (Drag) ๊ณผ ๊ฐ™์€ ์š”์†Œ๋Š” ์‹œ์Šคํ…œ์— ๋†’์€ ๋น„์„ ํ˜•์„ฑ์„ ๋ฐœ์ƒ์‹œํ‚ค๋Š”๋ฐ,๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋น„์„ ํ˜•์„ฑ์„ ๊ฐ€์ง€๋Š” ํ™˜๊ฒฝํ•˜์—์„œ๋„ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ (Data-driven) ๊ตญ๋ถ€ ์ตœ์  (Local optimal) ๋น„ํ–‰์ฒด ์ œ์–ด๊ธฐ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•จ์„ ๋ณด์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์ตœ์  ์ œ์–ด ๋ฐฉ๋ฒ•๋ก  (๊ฐ•ํ™”ํ•™์Šต)์„ ์ ์šฉํ•จ์œผ๋กœ์จ ์ „ํ†ต์ ์ธ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ œ์–ด ์ด๋ก ์  ๋ฐฉ๋ฒ•๋ก ๊ณผ ์ฐจ๋ณ„ํ™”๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ์˜ ๊ฒฐ๊ณผ๋ฌผ์€ ๊ฐœ๋… ์ฆ๋ช…(Proof of Concept) ์ˆ˜์ค€์ด์ง€๋งŒ, ์ถ”๊ฐ€์ ์ธ hyper-parameter tuning ๋ฐ ๋” ๋งŽ์€ ์ปดํ“จํ„ฐ ์ž์› ์‚ฌ์šฉ์„ ํ†ตํ•ด ๋น„ํ–‰์ฒด ์ œ์–ด์˜ ์ถ”๊ฐ€์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค.

Kanghoon Lee, Geon-Hyeong Kim, Pedro Ortega, Daniel D. Lee, and Kee-Eung Kim: Bayesian optimistic Kullback-Leibler exploration. Machine Learning Journal (MLJ), 108. 2019. [๐Ÿ“„ Abstract] [๐Ÿ”— Link]

We consider a Bayesian approach to model-based reinforcement learning, where the agent uses a distribution of environment models to find the action that optimally trades off exploration and exploitation. Unfortunately, it is intractable to find the Bayes-optimal solution to the problem except for restricted cases. In this paper, we present BOKLE, a simple algorithm that uses Kullbackโ€“Leibler divergence to constrain the set of plausible models for guiding the exploration. We provide a formal analysis that this algorithm is near Bayes-optimal with high probability. We also show an asymptotic relation between the solution pursued by BOKLE and a well-known algorithm called Bayesian exploration bonus. Finally, we show experimental results that clearly demonstrate the exploration efficiency of the algorithm.
2018

๊น€๊ฑดํ˜•, ์žฅ์˜์ˆ˜, ์ด์ข…๋ฏผ, and ๊น€๊ธฐ์‘: ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฒ ์ด์ง€์•ˆ ๊ฐ•ํ™”ํ•™์Šต์˜ ์—ฐ์†๋œ ๋„๋ฉ”์ธ์œผ๋กœ์˜ ํ™•์žฅ์— ๋Œ€ํ•œ ์—ฐ๊ตฌ. ํ•œ๊ตญํ†ต์‹ ํ•™ํšŒ ํ•˜๊ณ„์ข…ํ•ฉํ•™์ˆ ๋ฐœํ‘œํšŒ ๋…ผ๋ฌธ์ง‘. 2018. [๐Ÿ“„ Abstract] [๐Ÿ”— Link]

๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์กด์˜ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฒ ์ด์ง€์•ˆ ๊ฐ•ํ™”ํ•™์Šต ๋ฌธ์ œ๊ฐ€ ํ•œ์ •๋œ ์ž‘์€ ๋„๋ฉ”์ธ์—๋งŒ ์ ์šฉ๋˜๋Š” ๊ฒƒ์„ ๋ณด๋‹ค ์ผ๋ฐ˜์ ์ด๊ณ  ์—ฐ์†์ ์ธ ๋„๋ฉ”์ธ์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ์ด๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๊ธฐ์กด์˜ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฒ ์ด์ง€์•ˆ ๊ฐ•ํ™”ํ•™์Šต ์—ฐ๊ตฌ์—์„œ ๋ณ€๋ถ„ ์ถ”๋ก  ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ๋ณด๋‹ค ์—ฐ์†์ ์ธ ๋„๋ฉ”์ธ์—์„œ์˜ ์‚ฌํ›„ ๋ถ„ํฌ ์—…๋ฐ์ดํŠธ๋ฅผ ๊ฐ€๋Šฅํ•˜๋„๋ก ํ•˜๊ณ ์ž ํ•œ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ด๋ฅผ ํ†ตํ•ด ์—ฐ์†๋œ ๋„๋ฉ”์ธ์—์„œ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฒ ์ด์ง€์•ˆ ๊ฐ•ํ™”ํ•™์Šต์„ ์ ์šฉํ•˜๊ณ ์ž ํ•œ๋‹ค.

Wonseok Jeon, Seokin Seo, and Kee-Eung Kim: A Bayesian Approach to Generative Adversarial Imitation Learning. Advances in Neural Information Processing Systems (NeurIPS). 2018. Spotlight [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Generative adversarial training for imitation learning has shown promising results on high-dimensional and continuous control tasks. This paradigm is based on reducing the imitation learning problem to the density matching problem, where the agent iteratively refines the policy to match the empirical state-action visitation frequency of the expert demonstration. Although this approach can robustly learn to imitate even with scarce demonstration, one must still address the inherent challenge that collecting trajectory samples in each iteration is a costly operation. To address this issue, we first propose a Bayesian formulation of generative adversarial imitation learning (GAIL), where the imitation policy and the cost function are represented as stochastic neural networks. Then, we show that we can significantly enhance the sample efficiency of GAIL leveraging the predictive density of the cost, on an extensive set of imitation learning tasks with high-dimensional states and actions. Generative adversarial training for imitation learning has shown promising results on high-dimensional and continuous control tasks. This paradigm is based on reducing the imitation learning problem to the density matching problem, where the agent iteratively refines the policy to match the empirical state-action visitation frequency of the expert demonstration. Although this approach can robustly learn to imitate even with scarce demonstration, one must still address the inherent challenge that collecting trajectory samples in each iteration is a costly operation. To address this issue, we first propose a Bayesian formulation of generative adversarial imitation learning (GAIL), where the imitation policy and the cost function are represented as stochastic neural networks. Then, we show that we can significantly enhance the sample efficiency of GAIL leveraging the predictive density of the cost, on an extensive set of imitation learning tasks with high-dimensional states and actions.

Jongmin Lee, Geon-Hyeong Kim, Pascal Poupart, and Kee-Eung Kim: Monte-Carlo Tree Search for Constrained POMDPs. Advances in Neural Information Processing Systems (NeurIPS). 2018. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems. However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural. The constrained POMDP (CPOMDP) is such a model that maximizes the reward while constraining the cost, extending the standard POMDP model. To date, solution methods for CPOMDPs assume an explicit model of the environment, and thus are hardly applicable to large-scale real-world problems. In this paper, we present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment. In the experiments, we demonstrate that CC-POMCP converges to the optimal stochastic action selection in CPOMDP and pushes the state-of-the-art by being able to scale to very large problems.

Eun Sang Cha, Kee-Eung Kim, Stefano Longo, and Ankur Mehta: OP-CAS: Collision Avoidance with Overtaking Maneuvers. Proceedings of the IEEE Intelligent Transport Systems Conference (ITSC). 2018. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

This paper presents a novel collision avoidance system for autonomous vehicles based on overtaking procedures. The proposed Overtaking Procedure for Collision Avoidance Systems (OP-CAS) takes a behavioral cloning-based approach which uses images obtained out of a low cost monocular camera. The algorithm selectively records the expertโ€™s corrective driving behavior during data collection. This is performed recording oscillatory driving behavior when the vehicle is returning to the center of the lane. This data augmentation method addresses the issue of covariate shift commonly found in behavioral cloning methods. This approach is computationally inexpensive, making it a viable option for real time embedded deployment. A feasibility study was performed with two remotely controlled scaled vehicles as a proof of concept. Results showed that when two expert drivers demonstrated overtaking behaviors for data collection, even a small dataset was sufficient to model the overtaking sequence. The overtaking maneuvers were deployed in real time on 1/8th scale RC platforms, validating OP-CAS for civilian vehicle safety applications.

MinKu Kang and Kee-Eung Kim: Simulated Physics for High Speed Aerial Systems. Proceedings of International Conference on Control, Automation and Systems (ICCAS). 2018. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

In this work, we introduce a model of an aerial system based on a physics-based simulation engine. We investigate some basic properties of the proposed model, showing its potential benefit for autonomous control.

Jongmin Lee, Geon-Hyeong Kim, Pascal Poupart, and Kee-Eung Kim: Monte-Carlo Tree Search for Constrained MDPs. ICML/IJCAI/AAMAS Workshop on Planning and Learning (PAL). 2018. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Monte-Carlo Tree Search (MCTS) is the state-of-the-art online planning algorithm for very large MDPs. However, many real-world problems inherently have multiple goals, where multi-objective sequential decision models are more natural. The constrained MDP (CMDP) is such a model that maximizes the reward while constraining the cost. The common solution method for CMDPs is linear programming (LP), which is hardly applicable to large real-world problems. In this paper, we present CCUCT (Cost-Constrained UCT), an online planning algorithm for large constrained MDPs (CMDPs) that leverages the optimization of LP-induced parameters. We show that CCUCT converges to the optimal stochastic action selection in CMDPs and it is able to solve very large CMDPs through experiments on the multi-objective version of an Atari 2600 arcade game.

Youngsoo Jang, Jiyeon Ham, Byung-Jun Lee, and Kee-Eung Kim: Cross-language Neural Dialog State Tracker for Large Ontologies using Hierarchical Attention. IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP). 2018. [๐Ÿ“„ Abstract] [๐Ÿ”— Link]

Dialog state tracking, which refers to identifying the user intent from utterances, is one of the most important tasks in dialog management. In this paper, we present our dialog state tracker developed for the Fifth Dialog State Tracking Challenge, which focused on cross-language adaptation using very scarce machine-translated training data when compared to the size of the ontology. Our dialog state tracker is based on the bi-directional long short-term memory network with a hierarchical attention mechanism in order to spot important words in user utterances. The user intent is predicted by finding the closest keyword in the ontology to the attention-weighted word vector. With the suggested methodology, our tracker can overcome various difficulties due to the scarce training data that existing machine learning based trackers had, such as predicting user intents they haven't seen before. We show that our tracker outperforms other trackers submitted to the challenge with respect to most of the performance measures.

Jiyeon Ham, Soohyun Lim, Kyeng-Hun Lee, and Kee-Eung Kim: Extensions to hybrid code networks for FAIR dialog dataset. Computer Speech and Language:12. 2018. [๐Ÿ“„ Abstract] [๐Ÿ”— Link]

Goal-oriented dialog systems require a different approach from chit-chat conversational systems in that they should perform various subtasks as well as continue the conversation itself. Since these systems typically interact with an external knowledge base that changes over time, it is desirable to incorporate domain knowledge to deal with such changes, yet with minimum human effort. This paper presents an extended version of the Hybrid Code Network (HCN) developed for the Facebook AI research (FAIR) dialog dataset used in the Sixth Dialog System Technology Challenge (DSTC6). Compared to the original HCN, the system was more adaptable to changes in the knowledge base due to the modules that are extended to be learned from data. Using the proposed learning scheme with fairly elementary domain-specific rules, the proposed model achieved 100% accuracy in all test datasets.

Jang Won Bae, Junseok Lee, Do-Hyung Kim, Kanghoon Lee, Jongmin Lee, Kee-Eung Kim, and Il-Chul Moon: Layered Behavior Modeling via Combining Descriptive and Prescriptive Approaches: a Case Study of Infantry Company Engagement. IEEE Transactions on System, Man, and Cybernetics: Systems. 2018. [๐Ÿ“„ Abstract] [๐Ÿ”— Link]

Defense modeling and simulation (DM&S) has brought insights into how to efficiently operate combat entities, such as soldiers and weapon systems. Most DM&S works have been developed to reflect accurate descriptions of military doctrines, yet these doctrines provide only guidelines of military operations, not details about how the combat entities should behave. Because such vague parts are often fulfilled with the appropriate behavior of combat entities in a battlefield, one part argues that DM&S should consider individual combat behaviors as well. However, it is known as an infeasible problem discovering best individual actions from infinite searching space, such as the battlefield. This paper proposes a layered behavior modeling to practically resolve this issue. The proposed method applies descriptive modeling to reduce the searching space by employing domain-specific knowledge; and prescriptive modeling to discover best individual actions in the reduced space. For the generalization, the proposed method adapts both modeling methods being modularized, and then the proposed method suggested an interface between them that is based on their semantic analogies. Both modeling methods are modularized, so they are interacted through an interface defined in the proposed method. This paper presents a realization of the proposed method through a case study of infantry company-level operations. In the case study, the proposed method is implemented with discrete event system specification formalism as the descriptive part and Markov decision process as the prescriptive part. The experimental results illustrated that the combat effectiveness resulted from the proposed method is statistically better than that from the descriptive-only modeling, and the difference would be guided by the objective of the combat behavior. Through the presented experimental results and the discussion, this paper argues that future DM&S should consider a broad spectrum from the battlefield incorporating the rational behavior of military individuals.

Kee-Eung Kim and Hyun-Soo Park: Imitation Learning via Kernel Mean Embedding. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2018. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

Imitation learning refers to the problem where an agent learns a policy that mimics the demonstration provided by the expert, without any information on the cost function of the environment. Classical approaches to imitation learning usually rely on a restrictive class of cost functions that best explains the expert's demonstration, exemplified by linear functions of pre-defined features on states and actions. We show that the kernelization of a classical algorithm naturally reduces the imitation learning to a distribution learning problem, where the imitation policy tries to match the state-action visitation distribution of the expert. Closely related to our approach is the recent work on leveraging generative adversarial networks (GANs) for imitation learning, but our reduction to distribution learning is much simpler, robust to scarce expert demonstration, and sample efficient. We demonstrate the effectiveness of our approach on a wide range of high-dimensional control tasks.
2017

Jiyeon Ham, Soohyun Lim, and Kee-Eung Kim: Extended Hybrid Code Networks for DSTC6 FAIR Dialog Dataset. Dialog System Technology Challenges 6 Workshop. 2017. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Goal-oriented dialog systems require a different approach compared to chit-chat conversations in that they should perform various subtasks as well as the dialog itself. Since the systems typically interact with an external database, it is efficient to import simple domain knowledge in order to deal with the external knowledge changes. This paper presents extended hybrid code networks for sixth dialog system technology challenge (DSTC6) Facebook AI research (FAIR) dialog dataset. Compared to the original hybrid code networks (HCNs), we reduced the required hand-coded rules and added trainable submodules. Due to the additional learning components and reasonable domain-specific rules, the proposed model can be applied to more complex domains and achieved 100% accuracies for all test sets.

Yung-Kyun Noh, Masashi Sugiyama, Kee-Eung Kim, Frank Park, and Daniel Lee: Generative Local Metric Learning for Kernel Regression. Advances in Neural Information Processing Systems (NIPS). 2017. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

This paper shows how metric learning can be used with Nadaraya-Watson (NW) kernel regression. Compared with standard approaches such as bandwidth selection, we show how metric learning can significantly reduce the mean square error (MSE) in kernel regression, particularly for high-dimensional data. We propose a method for efficiently learning a good metric function based upon analyzing the performance of the NW estimator for Gaussian-distributed data. A key feature of our approach is that the NW estimator with a learned metric uses information from both the global and local structure of the training data. Theoretical and empirical results confirm that the learned metric can considerably reduce the bias and MSE for kernel regression.

Jang Won Bae, Bowon Nam, Kee-Eung Kim, Junseok Lee, and Il-Chul Moon: Hybrid Modeling and Simulation of Tactical Maneuvers in Computer Generated Force. Proceedings of the IEEE Conference on System, Man, and Cybernetics (SMC). 2017. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Defense modeling and simulation (DM&S) offers insights into the efficient operations of combat entities, e.g., soldiers and weapon systems. Most DM&S aim at exact description of military doctrines, but often the doctrines fails to provide detail action procedures about how the combat entities conduct military operations. Such unspecified descriptions are filled with the rational behaviors of the combat entities in a battlefield, and thereby the combat effectiveness from these combat entities would differ. Also, by incorporating such rational factors, this could provide the insights that cannot be captured from the traditional works. To examine this postulation, this paper developed a computer generated force where the tactical maneuver of combat entities are realized by the combination of descriptive and prescriptive modeling. Specifically, the descriptive models describe the explicit action rules in military doctrines, and they are modeled using discrete event system specification (DEVS) formalism; the predictive models denoted the rational behavior of the combat entities under the military doctrines, and they are modeled using partially observable Markov decision process (POMDP). The provided results illustrated that the proposed approach helps to maintain a team formation effectively, and this formation maintenance lead to the better combat efficiency.

Jongmin Lee, Youngsoo Jang, Pascal Poupart, and Kee-Eung Kim: Constrained Bayesian Reinforcement Learning via Approximate Linear Programming. ECML-PKDD Workshop on Scaling-Up Reinforcement Learning (SURL). 2017. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

In this paper, we highlight our recent work~\cite{Lee2017} considering the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning (BRL) in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a model-based BRL algorithm for such an environment, eliciting risk-sensitive exploration in a principled way. Our algorithm efficiently solves the constrained BRL problem by approximate linear programming, and generates a finite state controller in an off-line manner. We provide theoretical guarantees and demonstrate empirically that our approach outperforms the state of the art.

์ด์ข…๋ฏผ, ํ™์ •ํ‘œ, ๋ฐ•์žฌ์˜, ์ด๊ฐ•ํ›ˆ, ๊น€๊ธฐ์‘, ๋ฌธ์ผ์ฒ , and ๋ฐ•์žฌํ˜„: ๋Œ€ํ™”๋ ฅ์ „ ๋ฐ ๊ธฐ๊ณ„ํ™” ๋ณด๋ณ‘ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ํ†ตํ•œ ๋Œ€๊ทœ๋ชจ ๊ฐ€์ƒ๊ตฐ์˜ POMDP ํ–‰๋™๊ณ„ํš ๋ฐ ํ•™์Šต ์‚ฌ๋ก€์—ฐ๊ตฌ. ์ •๋ณด๊ณผํ•™ํšŒ ์ปดํ“จํŒ…์˜ ์‹ค์ œ ๋…ผ๋ฌธ์ง€, 23(6):343-349. 2017. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

๋Œ€๊ทœ๋ชจ ๊ฐ€์ƒ๊ตฐ์˜ ์ „ํˆฌ ๋ชจ๋ธ๋ง ๋ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์ž์œจ์ ์œผ๋กœ ํ–‰๋™ํ•˜๋Š” ์ด์„ฑ์  ์ „ํˆฌ ๊ฐœ์ฒด์˜ ํ–‰๋™ ๋ฌ˜์‚ฌ๋Š” ํ–ฅํ›„ ๋ฐœ์ƒํ•  ์ „ํˆฌ์˜ ์ž‘์ „์„ ๊ณ ๋„ํ™”ํ•˜๊ณ  ํšจ์œจ์ ์ธ ๋ชจ์˜ ํ›ˆ๋ จ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ํ•ต์‹ฌ ์š”์†Œ์ด๋‹ค. DEVS-POMDP ๊ณ„์ธต์  ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์ „ํˆฌ ํ–‰๋™ ๊ต๋ฒ”์— ๋”ฐ๋ฅธ ์ƒ์œ„ ๋‹จ๊ณ„ ์˜์‚ฌ๊ฒฐ์ • ๋ฐ ๊ตฌ์ฒด์  ์„œ์ˆ ์ด ์–ด๋ ค์šด ํ•˜์œ„ ๋‹จ๊ณ„ ์ž์œจ ํ–‰๋™๊ณ„ํš์„ ๊ฐ๊ฐ DEVS ๋ฐ POMDP๋กœ ๋ชจ๋ธ๋งํ•จ์œผ๋กœ์จ ๋Œ€๊ทœ๋ชจ ๊ฐ€์ƒ๊ตฐ์„ ๋ชจ์˜ํ•˜์˜€์œผ๋‚˜, POMDP ์ตœ์  ํ–‰๋™์ •์ฑ… ๊ณ„์‚ฐ์— ์žˆ์–ด์„œ ๋งŽ์€ ์ปดํ“จํŒ… ์ž์›๋ฅผ ํ•„์š”๋กœ ํ•˜๋Š” ๋‹จ์ ์ด ์žˆ์—ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” DEVS-POMDP๋กœ ๋ชจ๋ธ๋ง๋œ ๋Œ€ํ™”๋ ฅ์ „ ๋ชจ์˜ ์‹œ๋‚˜๋ฆฌ์˜ค ๋ฐ ๊ธฐ๊ณ„ํ™” ๋ณด๋ณ‘์—ฌ๋‹จ ๊ณต๊ฒฉ์ž‘์ „ ๋ชจ์˜ ์‹œ๋‚˜๋ฆฌ์˜ค์˜ ์‚ฌ๋ก€์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ํšจ์œจ์ ์ธ POMDP ํŠธ๋ฆฌ ํƒ์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•˜๊ณ  ์ ๊ตฐ ํ–‰๋™ ์–‘์ƒ ๋ชจ๋ธ์˜ ํ•™์Šต์„ ํ†ตํ•œ ๊ฐ€์ƒ๊ตฐ ์ „ํˆฌ ๊ฐœ์ฒด์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ํ™•์ธํ•œ๋‹ค.

Jongmin Lee, Youngsoo Jang, Pascal Poupart, and Kee-Eung Kim: Constrained Bayesian Reinforcement Learning via Approximate Linear Programming. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI). 2017. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

In this paper, we consider the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a model-based Bayesian reinforcement learning (BRL) algorithm for such an environment, eliciting risk-sensitive exploration in a principled way. Our algorithm efficiently solves the constrained BRL problem by approximate linear programming, and generates a finite state controller in an offline manner. We provide theoretical guarantees and demonstrate empirically that our approach outperforms the state of the art.

Byung-Jun Lee, Jongmin Lee, and Kee-Eung Kim: Hierarchically-partitioned Gaussian Process Approximation. Proceedings of Artificial Intelligence and Statistics (AISTATS). 2017. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The Gaussian process (GP) is a simple yet powerful probabilistic framework for various machine learning tasks. However, exact algorithms for learning and prediction are prohibitive to be applied to large datasets due to inherent computational complexity. To overcome this main limitation, various techniques have been proposed, and in particular, local GP algorithms that scale "truly linearly" with respect to the dataset size. In this paper, we introduce a hierarchical model based on local GP for large-scale datasets, which stacks inducing points over inducing points in layers. By using different kernels in each layer, the overall model becomes multi-scale and is able to capture both long- and short-range dependencies. We demonstrate the effectiveness of our model by speed-accuracy performance on challenging real-world datasets.
2016

Youngsoo Jang, Jiyeon Ham, Byung-Jun Lee, Youngjae Chang, and Kee-Eung Kim: Neural Dialog State Tracker for Large Ontologies by Attention Mechanism. IEEE Workshop on Spoken Language Technology. 2016. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

This paper presents a dialog state tracker submitted to Dialog State Tracking Challenge 5 (DSTC 5) with details. To tackle the challenging cross-language human-human dialog state tracking task with limited training data, we propose a tracker that focuses on words with meaningful context based on attention mechanism and bi-directional long short term memory (LSTM). The vocabulary including a plenty of proper nouns is vectorized with a sufficient amount of related texts crawled from web to learn a good embedding for words not existent in training dialogs. Despite its simplicity, our proposed tracker succeeded to achieve high accuracy without sophisticated pre- and post-processing.

Daehyun Lee, Jongmin Lee, and Kee-Eung Kim: Multi-View Automatic Lip-Reading using Neural Network. ACCV 2016 Workshop on Multi-view Lip-reading Challenges. 2016. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

It is well known that automatic lip-reading (ALR), also known as visual speech recognition (VSR), enhances the performance of speech recognition in a noisy environment and also has applications itself. However, ALR is a challenging task due to various lip shapes and ambiguity of visemes (the basic unit of visual speech information). In this paper, we tackle ALR as a classification task using end-to-end neural network based on convolutional neural network and long short-term memory architecture. We conduct single, cross, and multi-view experiments in speaker independent setting with various network configuration to integrate the multi-view data. We achieve 77.9%, 83.8%, and 78.6% classification accuracies in average on single, cross, and multi-view respectively. This result is better than the best score (76%) of preliminary single-view results given by ACCV 2016 workshop on multi-view lip-reading/audiovisual challenges. It also shows that additional view information helps to improve the performance of ALR with neural network architecture.

ํ™์ •ํ‘œ, ์ด์ข…๋ฏผ, ์ด๊ฐ•ํ›ˆ, ํ•œ์ƒ๊ทœ, ๊น€๊ธฐ์‘, ๋ฌธ์ผ์ฒ , and ๋ฐ•์žฌํ˜„: ๋Œ€๊ทœ๋ชจ ๊ฐ€์ƒ๊ตฐ์˜ POMDP ํ–‰๋™๊ณ„ํš ๋ฐ ํ•™์Šต ์‚ฌ๋ก€์—ฐ๊ตฌ. ํ•œ๊ตญ์ •๋ณด๊ณผํ•™ํšŒ ํ•˜๊ณ„ํ•™์ˆ ๋ฐœํ‘œํšŒ ๋…ผ๋ฌธ์ง‘. 2016. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

๋Œ€๊ทœ๋ชจ ๊ฐ€์ƒ๊ตฐ์˜ ์ „ํˆฌ ๋ชจ๋ธ๋ง ๋ฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์€ ํ–ฅํ›„ ๋ฐœ์ƒํ•  ์ „ํˆฌ์˜ ์ž‘์ „์„ ๊ณ ๋„ํ™”ํ•˜๊ณ  ํšจ์œจ์ ์ธ ๋ชจ์˜ ํ›ˆ๋ จ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด, DEVS-POMDP ๊ณ„์ธต์  ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ๋Š” ์ „ํˆฌ ํ–‰๋™ ๊ต๋ฒ”๊ณผ ๊ทธ์— ๋”ฐ๋ฅธ ๊ตฌ์ฒด์ ์ธ ํ–‰๋™๊ณ„ํš์„ ๊ฐ๊ฐ DEVS์™€ POMDP๋กœ ๋ชจ๋ธ๋งํ•˜์—ฌ ๊ฐ€์ƒ๊ตฐ์˜ ์ž์œจ์ ์ธ ํ–‰๋™์„ ๋ชจ์˜ํ•˜์˜€์œผ๋‚˜, POMDP ๋ชจ๋ธ์—์„œ ์ตœ์  ํ–‰๋™์ •์ฑ…์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์€ ์—ฌ์ „ํžˆ ๋งŽ์€ ์ปดํ“จํŒ… ์ž์›๋ฅผ ํ•„์š”๋กœ ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ๋Š” DEVS-POMDP๋กœ ๋ชจ๋ธ๋ง๋œ ์—ฐํ‰๋„ ๋Œ€ํ™”๋ ฅ์ „ ๋ชจ์˜ ์‹œ๋‚˜๋ฆฌ์˜ค์˜ ์‚ฌ๋ก€์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ํšจ์œจ์ ์ธ POMDP ํŠธ ๋ฆฌ ํƒ์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ฐ ์ ๊ตฐ ํ–‰๋™ ์–‘์ƒ ๋ชจ๋ธ์˜ ํ•™์Šต์„ ํ†ตํ•œ ๊ฐ€์ƒ๊ตฐ ์ „ํˆฌ ๊ฐœ์ฒด์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ํ™•์ธํ•œ๋‹ค.

ํ™ํƒ๊ทœ, ๊น€๊ฑดํ˜•, ์ด๋ณ‘์ค€, and ๊น€๊ธฐ์‘: Multi-armed Bandit์„ ์ด์šฉํ•œ ์š”๊ฒฉ ๋ฌด์žฅ ํ• ๋‹น ๋ฌธ์ œ์˜ ํ™•๋ฅ ์ ์ธ ์ ‘๊ทผ. ํ•œ๊ตญ์ •๋ณด๊ณผํ•™ํšŒ ํ•˜๊ณ„ํ•™์ˆ ๋ฐœํ‘œํšŒ ๋…ผ๋ฌธ์ง‘. 2016. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ ๊ตฐ์ด ์•„๊ตฐ์˜ ๊ธฐ์ง€๋ฅผ ํ–ฅํ•ด ๋ฌด์žฅ์„ ๋ฐœ์‚ฌํ–ˆ์„ ๋•Œ ์ด๋ฅผ ์š”๊ฒฉํ•˜๊ธฐ ์œ„ํ•ด ๋ฐœ์‚ฌํ•  ์š”๊ฒฉ ๋ฌด์žฅ์˜ ๊ฐœ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃฌ๋‹ค. ๊ธฐ์กด์˜ ์š”๊ฒฉ ๋ฌด์žฅ ํ• ๋‹น ๋ฌธ์ œ์— ๊ด€ํ•œ ์—ฐ๊ตฌ๋“ค์€ ์š”๊ฒฉ ๋ฌด์žฅ์˜ ์š”๊ฒฉ ์„ฑ๊ณต ํ™•๋ฅ ์„ ์•Œ๊ณ  ์žˆ๋‹ค๋Š” ๋น„ํ˜„์‹ค์ ์ธ ๊ฐ€์ •์„ ๋‘์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค์ œ ์ „์Ÿ ์ค‘์—๋Š” ์ƒํ™ฉ์— ๋”ฐ๋ผ ์š”๊ฒฉ ์„ฑ๊ณต ํ™•๋ฅ ์ด ๊ธฐ์กด์— ๊ฐ€์ •ํ•œ ๊ฐ’๊ณผ๋Š” ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋”์šฑ ํ˜„์‹ค์ ์ธ ์—ฐ๊ตฌ๊ฐ€ ๋˜๋ ค๋ฉด ์ด ํ™•๋ฅ ์ด ์•Œ๋ ค์ ธ ์žˆ์ง€ ์•Š๋‹ค๊ณ  ๊ฐ€์ •ํ•œ ์ฑ„ ์ง„ํ–‰๋˜์–ด์•ผ ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์š”๊ฒฉ ์„ฑ๊ณต ํ™•๋ฅ ์ด ์•Œ๋ ค์ ธ ์žˆ์ง€ ์•Š๋‹ค๋Š” ๊ฐ€์ •์„ ๋ฐ”ํƒ•์œผ๋กœ ์š”๊ฒฉ ๋ฌด์žฅ ํ• ๋‹น ๋ฌธ์ œ๋ฅผ multi-armed bandit ๋ฌธ์ œ๋กœ ๋ชจ๋ธ๋งํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.

Teakgyu Hong, Jongmin Lee, Kee-Eung Kim, Pedro A. Ortega, and Daniel Lee: Bayesian Reinforcement Learning with Behavioral Feedback. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1571-1577. 2016. [๐Ÿ“„ Abstract] [๐Ÿ”— Link]

In the standard reinforcement learning setting, the agent learns optimal policy solely from state transitions and rewards from the environment. We consider an extended setting where a trainer additionally provides feedback on the actions executed by the agent. This requires appropriately incorporating the feedback, even when the feedback is not necessarily accurate. In this paper, we present a Bayesian approach to this extended reinforcement learning setting. Specifically, we extend Kalman Temporal Difference learning to compute the posterior distribution over Q-values given the state transitions and rewards from the environment as well as the feedback from the trainer. Through experiments on standard reinforcement learning tasks, we show that learning performance can be significantly improved even with inaccurate feedback.

Byung-Jun Lee and Kee-Eung Kim: Dialog History Construction with Long-Short Term Memory for Robust Generative Dialog State Tracking. Dialogue & Discourse 7(3). 2016. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

One of the crucial components of dialog system is the dialog state tracker, which infers user's intention from preliminary speech processing. Since the overall performance of the dialog system is heavily affected by that of the dialog tracker, it has been one of the core areas of research on dialog systems. In this paper, we present a dialog state tracker that combines a generative probabilistic model of dialog state tracking with the recurrent neural network for encoding important aspects of the dialog history. We describe a two-step gradient descent algorithm that optimizes the tracker with a complex loss function. We demonstrate that this approach yields a dialog state tracker that performs competitively with top-performing trackers participated in the first and second Dialog State Tracking Challenges.

Yeganeh Mashayekh Hayeri, Kee-Eung Kim, and Daniel D. Lee: An Inverse Reinforcement Learning Approach to Car Following Behaviors. TRB 95th Annual Meeting Compendium of Papers, Transportation Research Board. 2016. [๐Ÿ“„ Abstract] [๐Ÿ”— Link]

In this study we provide new insights into the classic car-following theories by learning driversโ€™ behavioral preferences. We model car-following behavior using decision-theoretic techniques. We assume the driver is a decision maker acting based on a utility function that assigns the degree of desirability of the driving situation. Our method is to use inverse problem in control theory, also known as inverse reinforcement learning in a more modern terminology in machine learning. We use a publically available dataset on the car-following behavior known as the Bosch dataset, which includes headway distance, speed and acceleration data. Our simulation results discover the reward function that makes the actual driving behavior in the data preferable to any other behavior. Understanding such behaviors and preferences is becoming crucial as we are entering the modern era of transportation automation. Considering driversโ€™ preferences while designing for automation features would improve the safety and efficiency of the driving environment while ensuring desirable and comfortable setting for those inside the vehicles.
2015

ํ™ํƒ๊ทœ, ์ด๋ณ‘์ค€, ๊น€๊ฑดํ˜•, and ๊น€๊ธฐ์‘: ๊ณ„์ธตํ˜• ๋ชจ๋ธ๋ง์„ ํ†ตํ•œ ์ˆœ์ฐจ์  ๋ฌด์žฅ ํ• ๋‹น ๋ฌธ์ œ์˜ ํšจ๊ณผ์ ์ธ ์ ‘๊ทผ. ํ•œ๊ตญ์ •๋ณด๊ณผํ•™ํšŒ ๋™๊ณ„ํ•™์ˆ ๋ฐœํ‘œํšŒ ๋…ผ๋ฌธ์ง‘. 2015. [๐Ÿ“„ Abstract] [๐Ÿ”— Link]

๋ฌด์žฅ ํ• ๋‹น ๋ฌธ์ œ(weapon-allocation problem)๋Š” ์ ๊ตฐ์˜ ์‚ฐ๋ฐœ์ ์ธ ๋ฌด์žฅ ๊ณต๊ฒฉ์— ๋Œ€ํ•ด์„œ ์•„๊ตฐ์˜ ์š”๊ฒฉ ๋ฌด์žฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ• ๋‹นํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๋งํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฌ๋Ÿฌ ์‹œ๊ฐ„์— ๊ฑธ์ณ์„œ ์•„๊ตฐ์„ ํ–ฅํ•ด ๋ฐœ์‚ฌ๋˜๋Š” ์ˆœ์ฐจ์ ์ธ ๋ฌด์žฅ ํ• ๋‹น ๋ฌธ์ œ์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฃจ๋Š”๋ฐ, ์ด๋Ÿฌํ•œ ์ˆœ์ฐจ์  ๋ฌด์žฅ ํ• ๋‹น ๋ฌธ์ œ๋Š” ์ ๊ตฐ์˜ ๋ฌด์žฅ ๊ฐœ์ˆ˜, ์•„๊ตฐ์˜ ์ž์‚ฐ๊ฐœ์ˆ˜, ์•„๊ตฐ์˜ ์š”๊ฒฉ ๋ฌด์žฅ ๊ฐœ์ˆ˜ ๋“ฑ์— ๋”ฐ๋ผ์„œ ์ƒํƒœ(state)์˜ ์ˆ˜๊ฐ€ ํญ๋ฐœ์ ์œผ๋กœ ๋งŽ์•„์ง€๊ฒŒ ๋œ๋‹ค. ๊ทธ๋ ‡๊ฒŒ ๋  ๊ฒฝ์šฐ,๊ณ ์ „์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†๊ฒŒ ๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•œ ๊ณ„์ธตํ˜•๋ชจ๋ธ๋ง์„ ํ†ตํ•ด ๋ฌด์žฅ ํ• ๋‹น ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.

Pedro Ortega, Kee-Eung Kim, and Daniel Lee: Reactive bandits with attitude. Proceedings of Artificial Intelligence and Statistics (AISTATS). 2015. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We consider a general class of K-armed bandits that adapt to the actions of the player. A single continuous parameter characterizes the "attitude" of the bandit, ranging from stochastic to cooperative or to fully adversarial in nature. The player seeks to maximize the expected return from the adaptive bandit, and the associated optimization problem is related to the free energy of a statistical mechanical system under an external field. When the underlying stochastic distribution is Gaussian, we derive an analytic solution for the long run optimal player strategy for different regimes of the bandit. In the fully adversarial limit, this solution is equivalent to the Nash equilibrium of a two-player, zero-sum semi-infinite game. We show how optimal strategies can be learned from sequential draws and reward observations in these adaptive bandits using Bayesian filtering and Thompson sampling. Results show the qualitative difference in policy regret between our proposed strategy and other well-known bandit algorithms.

Jaedeug Choi and Kee-Eung Kim: Hierarchical Bayesian Inverse Reinforcement Learning. IEEE Transactions on Cybernetics, 45(4). 2015. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Inverse reinforcement learning (IRL) is the problem of inferring the underlying reward function from the expertโ€™s behavior data. The difficulty in IRL mainly arises in choosing the best reward function since there are typically an infinite number of reward functions that yield the given behavior data as optimal. Another difficulty comes from the noisy behavior data due to sub-optimal experts. We propose a hierarchical Bayesian framework, which subsumes most of the previous IRL algorithms as well as models the sub-optimality of the expertโ€™s behavior. Using a number of experiments on a synthetic problem, we demonstrate the effectiveness of our approach including the robustness of our hierarchical Bayesian framework to the sub-optimal expert behavior data. Using a real dataset from taxi GPS traces, we additionally show that our approach predicts the driving behavior with a high accuracy.

Pascal Poupart, Aarti Malhotra, Pei Pei, Kee-Eung Kim, Bongseok Goh, and Michael Bowling: Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2015. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

In many situations, it is desirable to optimize a primary objective while respecting some constraints with respect to secondary objectives. In this work, we describe a technique based on approximate linear programming to optimize policies in constrained partially observable Markov decision processes. The optimization is performed offline and produces a finite state controller with desirable performance guarantees. The approach performs favorably in comparison to a constrained version of point-based value iteration on a suite of benchmark problems.

Hyeoneun Kim, Woosang Lim, Kanghoon Lee, Yung-Kyun Noh, and Kee-Eung Kim: Reward Shaping for Model-Based Bayesian Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2015. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Bayesian reinforcement learning (BRL) provides a formal framework for optimal exploration-exploitation tradeoff in reinforcement learning. Unfortunately, it is generally intractable to find the Bayes-optimal behavior except for restricted cases. As a consequence, many BRL algorithms, model-based approaches in particular, rely on approximated models or real-time search methods. In this paper, we present potential-based shaping for improving the learning performance in model-based BRL. We propose a number of potential functions that are particularly well suited for BRL, and are domain-independent in the sense that they do not require any prior knowledge about the actual environment. By incorporating the potential function into real-time heuristic search, we show that we can significantly improve the learning performance in standard benchmark domains.

Kanghoon Lee and Kee-Eung Kim: Tighter Value Function Bounds for Bayesian Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2015. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Bayesian reinforcement learning (BRL) provides a principled framework for optimal exploration-exploitation tradeoff in reinforcement learning. We focus on model-based BRL, which involves a compact formulation of the optimal tradeoff from the Bayesian perspective. However, it still remains a computational challenge to compute the Bayes-optimal policy. In this paper, we propose a novel approach to compute tighter value function bounds of the Bayes-optimal value function, which is crucial for improving the performance of many model-based BRL algorithms.We then present how our bounds can be integrated into real-time AO* heuristic search, and provide a theoretical analysis on the impact of improved bounds on the search efficiency. We also provide empirical results on standard BRL domains that demonstrate the effectiveness of our approach.
2014

Byung-Jun Lee, Woosang Lim, and Kee-Eung Kim: Optimizing Generative Dialog State Tracker via Cascading Gradient Descent. Proceedings of the SIGDIAL, pp. 273-281. 2014. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

For robust spoken dialog management, various dialog state tracking methods have been proposed. Although discriminative models are gaining popularity due to their superior performance, generative models based on the Partially Observable Markov Decision Process model still remain attractive since they provide an integrated framework for dialog state tracking and dialog policy optimization. Although a straightforward way to fit a generative model is to independently train the component probability models, we present a gradient descent algorithm that simultaneously train all the component models. We show that the resulting tracker performs competitively with other top-performing trackers that participated in DSTC2.

Hyeoneun Kim, Bongseok Goh, Bowon Nam, Kanghoon Lee, Jeong Hee Hong, Il Chul Moon, and Kee-Eung Kim: Multi-Level Hybrid Behavior Model of Computer Generated Forces. Proceedings of the AAMAS Workshop on Agents, Virtual Societies and Analytics. 2014. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Computer Generated Forces (CGFs) refer to the simulation models of combat entities. While the holy grail of CGFs is the realistic reflection of the entities, it is difficult to achieve since the model is often too sophisticated to be replicated. Traditional models which translate field manuals to descriptive models generally produce reliable behaviors, but concern about being brittle in undescribed or unexpected situations is still remaining. In this respect, automated planning approaches can produce robust behaviors for dynamic situations, but the computational resource is too demanding to compute full-scale solutions. This paper proposes a multi-level behavior modeling approach that adopts the knowledgeengineering approach to describe high-level tactical behavior rules and the automated planning approach to compute low-level combat actions in dynamic combat situations. We show that this two-level approach ensures reliable behaviors with moderate computation time.

ํ™ํƒ๊ทœ, ๊ณ ๋ด‰์„, and ๊น€๊ธฐ์‘: ํ‚ค-์‹œํ€€์Šค ์˜ˆ์ธก์„ ํ†ตํ•œ ๊ฐ€๋ณ€ํ˜• ์†Œํ”„ํŠธ ํ‚ค๋ณด๋“œ - ์•ˆ๋“œ๋กœ์ด๋“œ ํ”Œ๋žซํผ ์ ์šฉ ์‚ฌ๋ก€ ์—ฐ๊ตฌ. ํ•œ๊ตญ์ปดํ“จํ„ฐ์ข…ํ•ฉํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘, pp. 1767-1769. 2014. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

์†Œํ”„ํŠธ ํ‚ค๋ณด๋“œ(soft keyboard)๋Š” ์†Œํ”„ํŠธ์›จ์–ด๋กœ ์ œ์–ด๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ๊ณผ ๋ฌผ๋ฆฌ์ ์ธ ํ‚ค๋ณด๋“œ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋ณด๋‹ค ๋„“์€ ํ™”๋ฉด์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์œผ๋กœ ์ธํ•ด ์ตœ๊ทผ ๋Œ€๋ถ€๋ถ„์˜ ์Šค๋งˆํŠธํฐ์—์„œ ์ด์šฉ๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์†Œํ”„ํŠธ ํ‚ค๋ณด๋“œ์˜ ์žฅ์ ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์Šค๋งˆํŠธํฐ ์ž์ฒด์˜ ์ž‘์€ ํ™”๋ฉด ํฌ๊ธฐ๋กœ ์ธํ•ด ์‚ฌ์šฉ์ž๋“ค์€ ๋งŽ์€ ์˜คํƒ€๋ฅผ ๋‚ธ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Microsoft Research์˜ Gunawardana et al. ์€ ํ‚ค-์‹œํ€€์Šค(key-sequence) ์˜ˆ์ธก์„ ํ†ตํ•œ ๊ฐ€๋ณ€ํ˜• ์†Œํ”„ํŠธ ํ‚ค๋ณด๋“œ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ๋ˆ„๋ฅธ ํ‚ค๋ณด๋“œ ์œ„์น˜์™€ ํ˜„์žฌ๊นŒ์ง€ ์ž…๋ ฅํ•œ ๋ฌธ์ž๋“ค์„ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ค์Œ์— ์ž…๋ ฅํ•  ๋ฌธ์ž๋“ค์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ , ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ‚ค๋ณด๋“œ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๊ฐ ํ‚ค(key)์˜ ๊ฐ์ง€ ์˜์—ญ์„ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์•ˆ๋“œ๋กœ์ด๋“œ ํ”Œ๋žซํผ(android platform)์—์„œ ํ‚ค-์‹œํ€€์Šค ์˜ˆ์ธก์„ ํ†ตํ•œ ๊ฐ€๋ณ€ํ˜• ์†Œํ”„ํŠธ ํ‚ค๋ณด๋“œ๋ฅผ ์ ์šฉํ•œ ์‚ฌ๋ก€์™€ ํ•œ๊ตญ์–ด ํ‚ค๋ณด๋“œ๋กœ๋„ ์ ์šฉํ•œ ์‚ฌ๋ก€๋ฅผ ๋ณด์ธ๋‹ค.
2013

๋ฐฐ์žฅ์›, ์ด๊ฐ•ํ›ˆ, ๊น€ํ˜„์€, ์ด์ค€์„, ๊ณ ๋ด‰์„, ๋‚จ๋ณด์›, ๋ฌธ์ผ์ฒ , ๊น€๊ธฐ์‘, and ๋ฐ•์žฌํ˜„: POMDP-DEVS๋ฅผ ํ™œ์šฉํ•œ ์ „ํˆฌ ๊ฐœ์ฒด ๋ชจ๋ธ๋ง. ๋Œ€ํ•œ์‚ฐ์—…๊ณตํ•™ํšŒ์ง€, 39(6):498-516. 2013. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Combat Modeling and Simulation (M&S) is significant to decision makers who predict the next direction of wars. Classical methodologies for combat M&S aimed to describe the exact behaviors of combat entities from military doctrines, yet they had a limitation of describing reasonable behaviors of combat entities that did not appear in the doctrines. Hence, this paper proposed a synthesizing modeling methodology for combat entity models considering both 1) the exact behaviors using descriptive modeling and 2) the reasonable behaviors using prescriptive modeling. With the proposed methodology, combat entities can represent a reality for combat actions rather than the classical methodologies. Moreover, the experiment results using the proposed methodology were significantly different from the results using the classical methodologies. Through the analyses of the experiment results, we showed that the reasonable behaviors of combat entities, which are not specified in the doctrines, should be considered in combat M&S.

์ž„ํฌ์ง„, ์ตœ์žฌ๋“, ์„์žฌํ˜„, and ๊น€๊ธฐ์‘: ์ถ”์ฒœ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ๋ฒ ์ด์ง€์•ˆ ํ˜‘๋ ฅ-๊ฒฝ์Ÿ ํ•„ํ„ฐ๋ง. ํ•œ๊ตญ์ปดํ“จํ„ฐ์ข…ํ•ฉํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘, pp. 1496-1498. 2013. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

์ถ”์ฒœ์‹œ์Šคํ…œ์€ ์‹œ์Šคํ…œ์ด ์ œ๊ณตํ•˜๋Š” ์ถ”์ฒœ๊ณผ ์ด์— ๋”ฐ๋ฅธ ์‚ฌ์šฉ์ž์˜ ์‘๋‹ต์ด๋ผ๋Š” ์ƒํ˜ธ์ž‘์šฉ์„ ์ˆ˜๋ฐ˜ํ•˜๋Š” ์‹œ์Šคํ…œ์ด๋‹ค. ์ด๋Ÿฌํ•œ ์ƒํ˜ธ์ž‘์šฉ ๊ณผ์ •์„ ์ถ”์ฒœ๋ชจ๋ธ์˜ ํ•™์Šต์— ์‚ฌ์šฉํ•˜๋Š” ํ˜‘๋ ฅ-๊ฒฝ์Ÿ ํ•„ํ„ฐ๋ง(Collaborative competitive filtering)์ด๋ผ๋Š” ์„ ํƒ๊ธฐ๋ฐ˜์˜ ์ถ”์ฒœ์‹œ์Šคํ…œ์ด ์ตœ๊ทผ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋งŽ์€ ๊ณ„์‚ฐ๋น„์šฉ์„ ์ˆ˜๋ฐ˜ํ•˜๋Š” ์ •๊ทœํ™” ๋งค๊ฐœ๋ณ€์ˆ˜(regularization parameter) ์กฐ์ •๊ณผ์ •์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๋‹จ์ ์ด ์กด์žฌํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ˜‘๋ ฅ-๊ฒฝ์Ÿ ํ•„ํ„ฐ๋ง์— ๋ฒ ์ด์ง€์•ˆ(Bayesian)๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์กฐ์ •๊ณผ์ •์ด ํ•„์š”ํ•˜์ง€ ์•Š์€ ๋ฒ ์ด์ง€์•ˆ ํ˜‘๋ ฅ-๊ฒฝ์Ÿ ํ•„ํ„ฐ๋ง(Bayesian collaborative competitive filtering)์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ธ์—์„œ์˜ ํšจ๊ณผ์ ์ธ ์ถ”๋ก ์„ ์œ„ํ•œ ๋งˆ๋ฅด์ฝ”ํ”„ ์‚ฌ์Šฌ ๋ชฌํ…Œ์นด๋ฅผ๋กœ(Markov chain Monte Carlo) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์†Œ๊ฐœํ•˜๋ฉฐ, ๋Œ€๊ทœ๋ชจ์˜ ๋ฐ์ดํ„ฐ์„ธํŠธ์—์„œ์˜ ์‹คํ—˜์„ ํ†ตํ•˜์—ฌ ๋ฒ ์ด์ง€์•ˆ ํ˜‘๋ ฅ-๊ฒฝ์Ÿ ํ•„ํ„ฐ๋ง์ด ํ˜‘๋ ฅ-๊ฒฝ์Ÿ ํ•„ํ„ฐ๋ง๋ณด๋‹ค ์šฐ์ˆ˜ํ•จ์„ ํ™•์ธํ•˜์˜€๋‹ค.

Daejoong Kim, Jaedeug Choi, Kee-Eung Kim, Jungsu Lee, and Jinho Sohn: Engineering Statistical Dialog State Trackers: A Case Study on DSTC. Department of Computer Science, KAIST, Technical Report(CS-TR-2013-379). 2013. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We describe our experience with engineering the dialog state tracker for the ๏ฌrst Dialog State Tracking Challenge (DSTC). Dialog trackers are one of the essential components of dialog systems which are used to infer the true user goal from the speech processing results. We explain the main parts of our tracker: the observation model, the belief re๏ฌnement model, and the belief transformation model. We also report experimental results on a number of approaches to the models, and compare the overall performance of our tracker to other submitted trackers. This technical report is a companion to the shortened version presented at SIGDIAL 2013.

Daejoong Kim, Jaedeug Choi, Kee-Eung Kim, Jungsu Lee, and Jinho Sohn: Engineering Statistical Dialog State Trackers: A Case Study on DSTC. Proceedings of the SIGDIAL 2013 Conference, pp. 462-466. 2013. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We describe our experience with engineering the dialog state tracker for the first Dialog State Tracking Challenge (DSTC). Dialog trackers are one of the essential components of dialog systems which are used to infer the true user goal from the speech processing results. We explain the main parts of our tracker: the observation model, the belief refinement model, and the belief transformation model. We also report experimental results on a number of approaches to the models, and compare the overall performance of our tracker to other submitted trackers. An extended version of this paper is available as a technical report (Kim et al., 2013).

Jaedeug Choi and Kee-Eung Kim: Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2013. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

Most of the algorithms for inverse reinforcement learning (IRL) assume that the reward function is a linear function of the pre-defined state and action features. However, it is often difficult to manually specify the set of features that can make the true reward function representable as a linear function. We propose a Bayesian nonparametric approach to identifying useful composite features for learning the reward function. The composite features are assumed to be the logical conjunctions of the predefined atomic features so that we can represent the reward function as a linear function of the composite features. We empirically show that our approach is able to learn composite features that capture important aspects of the reward function on synthetic domains, and predict taxi drivers' behaviour with high accuracy on a real GPS trace dataset.
2012

Jaedeug Choi and Kee-Eung Kim: Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions. Advances in Neural Information Processing Systems (NIPS). 2012. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

We present a nonparametric Bayesian approach to inverse reinforcement learning (IRL) for multiple reward functions. Most previous IRL algorithms assume that the behaviour data is obtained from an agent who is optimizing a single reward function, but this assumption is hard to guarantee in practice. Our approach is based on integrating the Dirichlet process mixture model into Bayesian IRL. We provide an efficient Metropolis-Hastings sampling algorithm utilizing the gradient of the posterior to estimate the underlying reward functions, and demonstrate that our approach outperforms previous ones via experiments on a number of problem domains.

Dongho Kim, Kee-Eung Kim, and Pascal Poupart: Cost-Sensitive Exploration in Bayesian Reinforcement Learning. Advances in Neural Information Processing Systems (NIPS). 2012. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected long term total reward. In order to formalize cost-sensitive exploration, we use the constrained Markov decision process (CMDP) as the model of the environment, in which we can naturally encode exploration requirements using the cost function. We extend BEETLE, a model-based BRL method, for learning in the environment with cost constraints. We demonstrate the cost-sensitive exploration behaviour in a number of simulated problems.

์ด๊ฐ•ํ›ˆ, ์ž„ํฌ์ง„, and ๊น€๊ธฐ์‘: A POMDP Approach to Optimizing P300 Speller BCI Paradigm. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 20(4). 2012. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿ”— Link]

To achieve high performance in brain-computer interfaces (BCIs) using P300, most of the work has been focused on feature extraction and classification algorithms. Although significant progress has been made in such signal processing methods in the lower layer, the issues in the higher layer, specifically determining the stimulus schedule in order to identify the target reliably and efficiently, remain relatively unexplored. In this paper, we propose a systematic approach to compute an optimal stimulus schedule in P300 BCIs. Our approach adopts the partially observable Markov decision process, which is a model for planning in partially observable stochastic environments. We show that the thus obtained stimulus schedule achieves a significant performance improvement in terms of the success rate, bit rate, and practical bit rate through human subject experiments.

์ด๊ฐ•ํ›ˆ, ์ž„ํฌ์ง„, and ๊น€๊ธฐ์‘: Factored POMDP๋ฅผ ์ด์šฉํ•œ ๊ฐ€์ƒ๊ตฐ์˜ ์ž์œจํ–‰์œ„ ๋ชจ๋ธ๋ง ์‚ฌ๋ก€์—ฐ๊ตฌ. ํ•œ๊ตญ์ปดํ“จํ„ฐ์ข…ํ•ฉํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘, vol. 39(1B). 2012. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

๊ฐ€์ƒ๊ตฐ์˜ ์ž์œจํ–‰์œ„ ๋ชจ๋ธ๋ง์€ ์ „์žฅ๋ชจ์˜๋ชจ๋ธ๋ง ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ๊ฒฐ์ •ํ•˜๋Š” ์ฃผ์š”ํ•œ ์š”์†Œ์ด๋‹ค. ๋ถˆํ™•์‹คํ•œ ์ƒํ™ฉ์„ ํ™•๋ฅ ์ ์œผ๋กœ ๊ณ ๋ คํ•˜์—ฌ ์ตœ์ ์˜ ์˜์‚ฌ๊ฒฐ์ •์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” POMDP (partially observable Markov decision process) ๋ชจ๋ธ์€ ๊ฐ€์ƒ๊ตฐ์˜ ์ž์œจํ–‰์œ„ ๋ชจ๋ธ๋ง์— ์žˆ์–ด์„œ ๋งค์šฐ ์ž์—ฐ์Šค๋Ÿฌ์šด ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ POMDP ๋ชจ๋ธ์˜ ๋†’์€ ๊ณ„์‚ฐ๋ณต์žก๋„๋กœ ์ธํ•œ ์ตœ์  ํ–‰๋™์ •์ฑ… ๊ณ„์‚ฐ์˜ ์–ด๋ ค์›€์€ POMDP ๋ชจ๋ธ์„ ์ด์šฉํ•œ ๊ฐ€์ƒ๊ตฐ์˜ ์ž์œจํ–‰์œ„ ๋ชจ๋ธ๋ง์„ ์ €ํ•ดํ•˜๋Š” ์š”์†Œ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋Œ€๊ทœ๋ชจ ๊ฐ€์ƒ๊ตฐ์˜ ์ž์œจํ–‰์œ„ ๋ชจ๋ธ๋ง์„ ์œ„ํ•ด factored POMDP ๋ชจ๋ธ์„ ์ด์šฉํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  "Hasty Defense" ์‚ฌ๋ก€์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ๊ทธ ํšจ๊ณผ๋ฅผ ํ™•์ธํ•œ๋‹ค. ๊ฐ€์ƒ๊ตฐ์˜ ์ž์œจํ–‰์œ„ ๋ชจ๋ธ๋ง์€ ์ „์žฅ๋ชจ์˜๋ชจ๋ธ๋ง ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ๊ฒฐ์ •ํ•˜๋Š” ์ฃผ์š”ํ•œ ์š”์†Œ์ด๋‹ค. ๋ถˆํ™•์‹คํ•œ ์ƒํ™ฉ์„ ํ™•๋ฅ ์ ์œผ๋กœ ๊ณ ๋ คํ•˜์—ฌ ์ตœ์ ์˜ ์˜์‚ฌ๊ฒฐ์ •์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” POMDP (partially observable Markovdecision process) ๋ชจ๋ธ์€ ๊ฐ€์ƒ๊ตฐ์˜ ์ž์œจํ–‰์œ„ ๋ชจ๋ธ๋ง์— ์žˆ์–ด์„œ ๋งค์šฐ ์ž์—ฐ์Šค๋Ÿฌ์šด ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ POMDP ๋ชจ๋ธ์˜ ๋†’์€ ๊ณ„์‚ฐ๋ณต์žก๋„๋กœ ์ธํ•œ ์ตœ์  ํ–‰๋™์ •์ฑ… ๊ณ„์‚ฐ์˜ ์–ด๋ ค์›€์€ POMDP ๋ชจ๋ธ์„ ์ด์šฉํ•œ ๊ฐ€์ƒ๊ตฐ์˜ ์ž์œจํ–‰์œ„ ๋ชจ๋ธ๋ง์„ ์ €ํ•ดํ•˜๋Š” ์š”์†Œ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋Œ€๊ทœ๋ชจ ๊ฐ€์ƒ๊ตฐ์˜ ์ž์œจํ–‰์œ„ ๋ชจ๋ธ๋ง์„ ์œ„ํ•ด factored POMDP ๋ชจ๋ธ์„ ์ด์šฉํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  "Hasty Defense" ์‚ฌ๋ก€์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ๊ทธ ํšจ๊ณผ๋ฅผ ํ™•์ธํ•œ๋‹ค.

Byung Kon Kang and Kee-Eung Kim: Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains. Artificial Intelligence, 182-183:32-57. 2012. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

While Partially Observable Markov Decision Processes (POMDPs) and their multi-agent extension Partially Observable Stochastic Games (POSGs) provide a natural and systematic approach to modeling sequential decision making problems under uncertainty, the computational complexity with which the solutions are computed is known to be prohibitively expensive. In this paper, we show how such high computational resource requirements can be alleviated through the use of symmetries present in the problem. The problem of finding the symmetries can be cast as a graph automorphism (GA) problem on a graphical representation of the problem. We demonstrate how such symmetries can be exploited in order to speed up the solution computation and provide computational complexity results.

๊น€๋™ํ˜ธ, ์ด์žฌ์†ก, ์ตœ์žฌ๋“, and ๊น€๊ธฐ์‘: ๋ณต์ˆ˜ ๋ฌด์ธ๊ธฐ๋ฅผ ์œ„ํ•œ POMDP ๊ธฐ๋ฐ˜ ๋™์  ์ž„๋ฌด ํ• ๋‹น ๋ฐ ์ •์ฐฐ ์ž„๋ฌด ์ตœ์ ํ™” ๊ธฐ๋ฒ•. ์ •๋ณด๊ณผํ•™ํšŒ ๋…ผ๋ฌธ์ง€: ์†Œํ”„ํŠธ์›จ์–ด ๋ฐ ์‘์šฉ, 39(6). 2012. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

์ตœ๊ทผ ๋ฌด์ธํ•ญ๊ณต๊ธฐ์˜ ์ œ์ž‘ ๊ธฐ์ˆ ์ด ๋ฐœ์ „ํ•จ์— ๋”ฐ๋ผ, ๋†์—…, ์žฌํ•ด ๊ด€์ธก์šฉ ๋“ฑ์˜ ๋ฏผ๊ฐ„ ์šฉ๋„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ •์ฐฐ ๋ฐ ๊ณต๊ฒฉ ๋“ฑ์˜ ๊ตฐ์‚ฌ์  ๋ชฉ์ ์œผ๋กœ ๋‹ค์ˆ˜์˜ ๋ฌด์ธ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋‹ค์–‘ํ•œ ์‹œ๋„๊ฐ€ ์ง„ํ–‰๋˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹ค์ˆ˜์˜ ๋ฌด์ธ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ์— ๊ฐ ๋ฌด์ธ๊ธฐ๋ฅผ ์‚ฌ๋žŒ์ด ์ง์ ‘ ์ œ์–ดํ•˜๋Š” ๋ฐ์—๋Š” ์–ด๋ ค์›€์ด ๋งŽ์œผ๋ฏ€๋กœ, ์ฃผ์–ด์ง„ ๋ชฉํ‘œ๋ฅผ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ž์œจ์ ์œผ๋กœ ํ˜‘๋ ฅํ•˜๋ฉฐ ํšจ๊ณผ์ ์ธ ํ–‰๋™์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐœ๋ฐœ์ด ํ•„์ˆ˜์ ์ด๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋Š” ์ˆœ์ฐจ์  ์˜์‚ฌ๊ฒฐ์ • ๋ฌธ์ œ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋งˆ์ฝ”ํ”„ ์˜์‚ฌ๊ฒฐ์ • ๊ณผ์ •(Markov Decision Processes; MDPs)๊ณผ ์ด๋ฅผ ๋ถ€๋ถ„์  ํ˜น์€ ๋ถ€์ •ํ™•ํ•œ ๊ด€์ฐฐ๊ฐ’์„ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋„๋ก ํ™•์žฅํ•œ ๋ถ€๋ถ„๊ด€์ฐฐ ๋งˆ์ฝ”ํ”„ ์˜์‚ฌ๊ฒฐ์ • ๊ณผ์ • (Partially Observable MDPs; POMDPs) ๋“ฑ์˜ ๋Œ€ํ‘œ์ ์ธ ์˜์‚ฌ๊ฒฐ์ •์ด๋ก  ๋ชจ๋ธ์„ ์ด์šฉํ•˜์—ฌ ๋ณต์žกํ•˜๊ณ  ๋ถˆํ™•์‹คํ•œ ํ™˜๊ฒฝ์—์„œ์˜ ์˜์‚ฌ๊ฒฐ์ • ๋ฌธ์ œ๋ฅผ ํ†ต๊ณ„์ ์œผ๋กœ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ณต์ˆ˜์˜ ๋ฌด์ธ๊ธฐ๋ฅผ ์ด์šฉํ•  ๋•Œ ๋™์  ์ž„๋ฌด ํ• ๋‹น ๋ฐ ์ •์ฐฐ ์ž„๋ฌด ๋ฌธ์ œ๋ฅผ POMDP๋ฅผ ์ด์šฉํ•˜์—ฌ ํšจ์œจ์ ์œผ๋กœ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ด๊ณ , ์„ผ์„œ์˜ ๊ด€์ฐฐ๊ฐ’์— ์˜ค์ฐจ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ, MDP์— ๋น„ํ•ด POMDP๋ฅผ ์ด์šฉํ•  ๋•Œ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ๋˜ํ•œ ์‹ค์ œ ์ฟผ๋“œ์ฝฅํ„ฐ(quadcopter)๋ฅผ ์ด์šฉํ•˜์—ฌ POMDP ์ •์ฑ…์ด ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ๋„ ์ž˜ ๋™์ž‘ํ•จ์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ์ž…์ฆํ•˜์˜€๋‹ค.
2011

Jaedeug Choi and Kee-Eung Kim: MAP Inference for Bayesian Inverse Reinforcement Learning. Advances in Neural Information Processing Systems (NIPS). 2011. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The difficulty in inverse reinforcement learning (IRL) arises in choosing the best reward function since there are typically an infinite number of reward functions that yield the given behaviour data as optimal. Using a Bayesian framework, we address this challenge by using the maximum a posteriori (MAP) estimation for the reward function, and show that most of the previous IRL algorithms can be modeled into our framework. We also present a gradient method for the MAP estimation based on the (sub)differentiability of the posterior distribution. We show the effectiveness of our approach by comparing the performance of the proposed method to those of the previous algorithms.

Jaeyoung Park, Kee-Eung Kim, and Yoon-Kyu Song: A POMDP-based Optimal Control of P300-based Brain-Computer Interfaces. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) NECTAR Track. 2011. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Most of the previous work on brain-computer interfaces (BCIs) exploiting the P300 in electroencephalography (EEG) has focused on low-level signal processing algorithms such as feature extraction and classification methods. Although a significant improvement has been made in the past, the accuracy of detecting P300 is limited by the inherently low signal-to-noise ratio in EEGs. In this paper, we present a systematic approach to optimize the interface using partially observable Markov decision processes (POMDPs). Through experiments involving human subjects, we show the P300 speller system that is optimized using the POMDP achieves a significant performance improvement in terms of the communication bandwidth in the interaction.

Dongho Kim, Jaesong Lee, Kee-Eung Kim, and Pascal Poupart: Point-Based Value Iteration for Constrained POMDPs. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2011. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Constrained partially observable Markov decision processes (CPOMDPs) extend the standard POMDPs by allowing the specification of constraints on some aspects of the policy in addition to the optimality objective for the value function. CPOMDPs have many practical advantages over standard POMDPs since they naturally model problems involving limited resource or multiple objectives. In this paper, we show that the optimal policies in CPOMDPs can be randomized, and present exact and approximate dynamic programming methods for computing randomized optimal policies. While the exact method requires solving a minimax quadratically constrained program (QCP) in each dynamic programming update, the approximate method utilizes the point-based value update with a linear program (LP). We show that the randomized policies are significantly better than the deterministic ones. We also demonstrate that the approximate point-based method is scalable to solve large problems.

Dongho Kim, Jaesong Lee, Kee-Eung Kim, and Pascal Poupart: Point-Based Value Iteration for Constrained POMDPs. Proceedings of the IJCAI Workshop on Decision Making in Partially Observable, Uncertain Worlds: Exploring Insights from Multiple Communities. 2011. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Constrained partially observable Markov decision processes (CPOMDPs) extend the standard POMDPs by allowing the specification of constraints on some aspects of the policy in addition to the optimality objective for the value function. CPOMDPs have many practical advantages over standard POMDPs since they naturally model problems involving limited resource or multiple objectives. In this paper, we show that the optimal policies in CPOMDPs can be randomized, and present exact and approximate dynamic programming methods for computing randomized optimal policies. While the exact method requires solving a minimax quadratically constrained program (QCP) in each dynamic programming update, the approximate method utilizes the point-based value update with a linear program (LP). We show that the randomized policies are significantly better than the deterministic ones. We also demonstrate that the approximate point-based method is scalable to solve large problems.

Eunsoo Oh and Kee-Eung Kim: A Geometric Traversal Algorithm for Reward-Uncertain MDPs. Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI). 2011. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Markov decision processes (MDPs) are widely used in modeling decision making problems in stochastic environments. However, precise specification of the reward functions in MDPs is often very difficult. Recent approaches have focused on computing an optimal policy based on the minimax regret criterion for obtaining a robust policy under uncertainty in the reward function. One of the core tasks in computing the minimax regret policy is to obtain the set of all policies that can be optimal for some candidate reward function. In this paper, we propose an efficient algorithm that exploits the geometric properties of the reward function associated with the policies. We also present an approximate version of the method for further speed up. We experimentally demonstrate that our algorithm improves the performance by orders of magnitude.

๊น€๋™ํ˜ธ, ์ด์žฌ์†ก, ๊น€๊ธฐ์‘, and ํŒŒ์Šค์นผ ํ‘ธํŒŒ๋ฅด: ์ œ์•ฝ์„ ๊ฐ–๋Š” POMDP๋ฅผ ์œ„ํ•œ ์ -๊ธฐ๋ฐ˜ ๊ฐ€์น˜ ๋ฐ˜๋ณต ์•Œ๊ณ ๋ฆฌ์ฆ˜. ํ•œ๊ตญ์ปดํ“จํ„ฐ์ข…ํ•ฉํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘, vol. 38(1A). 2011. ์ตœ์šฐ์ˆ˜ ๋…ผ๋ฌธ์ƒ [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

์ œ์•ฝ์„ ๊ฐ–๋Š” ๋ถ€๋ถ„ ๊ด€์ฐฐ ์˜์‚ฌ๊ฒฐ์ • ๊ณผ์ • (Constrained Partially Observable Markov Decision Process; CPOMDP)๋Š” ์ •์ฑ…์ด ์ œ์•ฝ (constraint)๋ฅผ ๋งŒ์กฑํ•˜๋ฉด์„œ ๊ฐ€์น˜ ํ•จ์ˆ˜๋ฅผ ์ตœ์ ํ™”ํ•˜๋„๋ก ์ผ๋ฐ˜์ ์ธ ๋ถ€๋ถ„ ๊ด€์ฐฐ ์˜์‚ฌ๊ฒฐ์ •๊ณผ์ • (POMDP)์„ ํ™•์žฅํ•œ ๋ชจ๋ธ์ด๋‹ค. CPOMDP๋Š” ์ œํ•œ๋œ ์ž์›์„ ๊ฐ€์ง€๊ฑฐ๋‚˜ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ๊ฐ€์ง€๋Š” ๋ฌธ์ œ๋ฅผ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ชจ๋ธ๋งํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ผ๋ฐ˜์ ์ธ POMDP์— ๋น„ํ•ด ๋” ์‹ค์šฉ์ ์ธ ์žฅ์ ์„ ๊ฐ€์ง„๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” CPOMDP์˜ ํ™•๋ฅ ์  ์ตœ์  ์ •์ฑ… ๋ฐ ๊ทผ์‚ฌ ์ตœ์  ์ •์ฑ…์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์  ๋ฐ ๊ทผ์‚ฌ ๋™์  ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ์ตœ์  ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋™์  ํ”„๋กœ๊ทธ๋ž˜๋ฐ์˜ ๊ฐ ๋‹จ๊ณ„๋ฏธ๋‹ค ๋ฏธ๋‹ˆ๋งฅ์Šค ์ด์ฐจ ์ œ์•ฝ ๊ณ„ํš ๋ฌธ์ œ๋ฅผ ๊ณ„์‚ฐํ•ด์•ผํ•˜๋Š” ๋ฐ˜๋ฉด์— ๊ทผ์‚ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์„ ํ˜• ๊ณ„ํš ๋ฌธ์ œ๋งŒ์„ ํ•„์š”๋กœ ํ•˜๋Š” ์ -๊ธฐ๋ฐ˜ (point-based) ๊ฐ€์น˜ ์—…๋ฐ์ดํŠธ๋ฅผ ์ด์šฉํ•œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ, ํ™•๋ฅ ์  ์ •์ฑ…์ด ๊ฒฐ์ •์  (deterministic) ์ •์ฑ…๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋ฉฐ, ๊ทผ์‚ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ†ตํ•ด ๊ณ„์‚ฐ ์‹œ๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€๋‹ค.

Pascal Poupart, Kee-Eung Kim, and Dongho Kim: Closing the Gap: Towards Provably Optimal POMDP Solutions. Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS). 2011. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

POMDP algorithms have made significant progress in recent years by allowing practitioners to find good solutions to increasingly large problems. Most approaches (including point-based and policy iteration techniques) operate by refining a lower bound of the optimal value function. Several approaches (e.g., HSVI2, SARSOP, grid-based approaches and online forward search) also refine an upper bound. However, approximating the optimal value function by an upper bound is computationally expensive and therefore tightness is often sacrificed to improve efficiency (e.g., sawtooth approximation). In this paper, we describe a new approach to efficiently compute tighter bounds by i) conducting a prioritized breadth first search over the reachable beliefs, ii) propagating upper bound improvements with an augmented POMDP and iii) using exact linear programming (instead of the sawtooth approximation) for upper bound interpolation. As a result, we can represent the bounds more compactly and significantly reduce the gap between upper and lower bounds on several benchmark problems.

Dongho Kim, Jin Hyung Kim, and Kee-Eung Kim: Robust Performance Evaluation of POMDP-Based Dialogue Systems. IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 19(4). 2011. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿ”— Link]

Partially observable Markov decision processes (POMDPs) have received significant interest in research on spoken dialogue systems, due to among many benefits its ability to naturally model the dialogue strategy selection problem under unreliable automated speech recognition. However, the POMDP approaches are essentially model-based, and as a result, the dialogue strategy computed from POMDP is still subject to the correctness of the model. In this paper, we extend some of the previous MDP user models to POMDPs, and evaluate the effects of user models on the dialogue strategy computed from POMDPs. We experimentally show that the strategies computed from POMDPs perform better than those from MDPs, and the strategies computed from poor user models fail severely when tested on different user models. This paper further investigates the evaluation methods for dialogue strategies, and proposes a method based on the bias-variance analysis for reliably estimating the dialogue performance.

Jaedeug Choi and Kee-Eung Kim: Inverse Reinforcement Learning in Partially Observable Environments. Journal of Machine Learning Research (JMLR), 12. 2011. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behavior of an expert. Most of the existing IRL algorithms assume that the environment is modeled as a Markov decision process (MDP), although it is desirable to handle partially observable settings in order to handle more realistic scenarios. In this paper, we present IRL algorithms for partially observable environments that can be modeled as a partially observable Markov decision process (POMDP). We deal with two cases according to the representation of the given expert's behavior, namely the case in which the expert's policy is explicitly given, and the case in which the expertโ€™s trajectories are available instead. The IRL in POMDPs poses a greater challenge than in MDPs since it is not only ill-posed due to the nature of IRL, but also computationally intractable due to the hardness in solving POMDPs. To overcome these obstacles, we present algorithms that exploit some of the classical results from the POMDP literature. Experimental results on several benchmark POMDP domains show that our work is useful for partially observable settings.

๊น€๋™ํ˜ธ and ๊น€๊ธฐ์‘: ๋ถ€๋ถ„๊ด€์ฐฐ ๋งˆ์ฝ”ํ”„ ์˜์‚ฌ๊ฒฐ์ •๊ณผ์ •์„ ์ด์šฉํ•œ ์ง€๋Šฅํ˜• ์—์ด์ „ํŠธ ๊ตฌํ˜„. ํ•œ๊ตญ์ •๋ณด๊ณผํ•™ํšŒ์ง€, 29(2). 2011. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

๋ณธ ๊ณ ์—์„œ๋Š” MDP ๋ฐ POMDP์˜ ๋ฐฉ๋ฒ•๋ก ์„ ์†Œ๊ฐœํ•˜๊ณ  ์‘์šฉ์‚ฌ๋ก€๋ฅผ ์‚ดํŽด๋ณด๋ฉฐ, ํŠนํžˆ ๋Œ€ํ™” ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ๊ณผ ๋‡Œ-์ปดํ“จํ„ฐ ์ธํ„ฐํŽ˜์ด์Šค์˜ POMDP ์ ์šฉ ์‚ฌ๋ก€์— ๋Œ€ํ•ด ๋…ผํ•œ๋‹ค. ๋˜ํ•œ POMDP์— ๊ด€๋ จํ•˜์—ฌ ํ˜„์žฌ ์ตœ์‹  ๊ธฐ์ˆ  ํ˜„ํ™ฉ๊ณผ ์ค‘์š” ์—ฐ๊ตฌ ์ฃผ์ œ์— ๋Œ€ํ•ด์„œ ์‚ดํŽด๋ณธ๋‹ค.
2010

Wonjun Lee, Sunjun Kim, Younkyung Lim, Alice Oh, Tekjin Nam, and Kee-Eung Kim: A Rapid Prototyping Method for Discovering User-Driven Opportunities for Personal Informatics. Proceedings of the International Conference on Virtual Systems and Multimedia (VSMM). 2010. Best Paper Award [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We present our ideas for a ubiquitous computing application for family life and happiness driven by human-centered discovery. We are particularly interested in the potential of personal informatics on discovering how โ€œknowing thyselfโ€ can help us understand what people truly value in their lives. In this position paper, we discuss a new prototyping approach in which we apply the concept of personal informatics to enable designers and developers to discover potentially viable opportunities for personal lifecare systems for family members to promote their happiness and family values by using the tools of ubiquitous computing.

Younkyung Lim, Alice Oh, Tekjin Nam, and Kee-Eung Kim: Personal Informatics for Discovering Human-Centered Lifecare System Opportunities. Proceedings of the ACM CHI Workshop on Know Thyself. 2010. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We present our ideas for a ubiquitous computing application for family life and happiness driven by human-centered discovery. We are particularly interested in the potential of personal informatics on discovering how โ€œknowing thyselfโ€ can help us understand what people truly value in their lives. In this position paper, we discuss a new prototyping approach in which we apply the concept of personal informatics to enable designers and developers to discover potentially viable opportunities for personal lifecare systems for family members to promote their happiness and family values by using the tools of ubiquitous computing.

Jaeyoung Park, Kee-Eung Kim, and Sungho Jo: A POMDP Approach to P300-Based Brain-Computer Interfaces. Proceedings of the ICAPS POMDP Practitioners Workshop. 2010. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Most of the previous work on non-invasive brain-computer interfaces (BCIs) has been focused on feature extraction and classification algorithms to achieve high performance for the communication between the brain and the computer. While significant progress has been made in the lower layer of the BCI system, the issues in the higher layer have not been sufficiently addressed. Existing P300-based BCI systems, for example the P300 speller, use a random order of stimulus sequence for eliciting P300 signal for identifying users' intentions. This paper is about computing an optimal sequence of stimulus in order to minimize the number of stimuli, hence improving the performance. To accomplish this, we model the problem as a partially observable Markov decision process (POMDP), which is a model for planning in partially observable stochastic environments. Through simulation and human subject experiments, we show that our approach achieves a significant performance improvement in terms of the success rate and the bit rate.

Youngwook Kim and Kee-Eung Kim: Point-Based Bounded Policy Iteration for Decentralized POMDPs. Proceedings of Pacific-Rim Conference on Artificial Intelligence (PRICAI) / Lecture Notes in Computer Science (LNCS) 6230. 2010. Best Poster Award [๐Ÿ“„ Abstract] [๐Ÿ”— Link]

We present a memory-bounded approximate algorithm for solving infinite-horizon decentralized partially observable Markov decision processes (DEC-POMDPs). In particular, we improve upon the bounded policy iteration (BPI) approach, which searches for a locally optimal stochastic finite state controller, by accompanying reachability analysis on controller nodes. As a result, the algorithm has different optimization criteria for the reachable and the unreachable nodes, and it is more effective in the search for an optimal policy. Through experiments on benchmark problems, we show that our algorithm is competitive to the recent nonlinear optimization approach, both in the solution time and the policy quality.

Jaeyoung Park, Kee-Eung Kim, and Sungho Jo: A POMDP Approach to P300-Based Brain-Computer Interfaces. Proceedings of the ACM International Conference on Intelligent User Interfaces (IUI). 2010. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Most of the previous work on non-invasive brain-computer interfaces (BCIs) has been focused on feature extraction and classification algorithms to achieve high performance for the communication between the brain and the computer. While significant progress has been made in the lower layer of the BCI system, the issues in the higher layer have not been sufficiently addressed. Existing P300-based BCI systems, for example the P300 speller, use a random order of stimulus sequence for eliciting P300 signal for identifying users' intentions. This paper is about computing an optimal sequence of stimulus in order to minimize the number of stimuli, hence improving the performance. To accomplish this, we model the problem as a partially observable Markov decision process (POMDP), which is a model for planning in partially observable stochastic environments. Through simulation and human subject experiments, we show that our approach achieves a significant performance improvement in terms of the success rate and the bit rate.
2009

Jaedeug Choi and Kee-Eung Kim: Inverse Reinforcement Learning in Partially Observable Environments. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 2009. [๐Ÿ“„ Abstract] [โœ๏ธ Paper] [๐Ÿง‘โ€๐Ÿ’ป Code]

Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behaviour of an expert. Most of the existing algorithms for IRL assume that the expert's environment is modeled as a Markov decision process (MDP), although they should be able to handle partially observable settings in order to widen the applicability to more realistic scenarios. In this paper, we present an extension of the classical IRL algorithm by Ng and Russell to partially observable environments. We discuss technical issues and challenges, and present the experimental results on some of the benchmark partially observable domains.
2008

Dongho Kim, Hyeong Seop Sim, Kee-Eung Kim, Jin Hyung Kim, Hyunjeong Kim, and Joo Won Sung: Effects of User Modeling on POMDP-based Dialogue Systems. Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH). 2008. Best Student Paper Runner-up [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Partially observable Markov decision processes (POMDPs) have gained significant interest in research on spoken dialogue systems, due to among many benefits its ability to naturally model the dialogue strategy selection problem under the unreliability in automated speech recognition. However, the POMDP approaches are essentially model-based, and as a result, the dialogue strategy computed from POMDP is subject to the correctness of the model. In this paper, we extend some of the previous user models for POMDPs, and evaluate the effects of user models on the dialogue strategy computed from POMDP.

Jae-Hyun Seok, Simon Levasseur, Kee-Eung Kim, and Jin Hyung Kim: Tracing Handwriting on Paper Document under Video Camera. Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR). 2008. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

This paper describes a system that traces handwriting on paper document under overlooking video camera. This work is motivated to capture annotations on paper documents written by ordinary pen as an input to computer. As the trajectory of the pen tip is extracted from the video, each part of the trajectory is classified as 'pen-down' or 'pen-up', according to whether the part makes a dark line. Detecting written inks is not simple when handwriting is made over printed documents. Because written inks may fall on dark regions of the document and often overlap previously written inks, simple background checking may not work on dark regions. So, we interpolated the decisions at the entering and the exiting of the dark region. The system makes two-level decisions to achieve both speed and accuracy. The classifier makes quick decisions based on local information in order not to lose pen trace. The local pen up-down decisions are corrected in the global point of view when the whole information of the writing process is available, such as when the hand is out of the view. Experimental result shows that the system detects handwritings accurately even on printed documents.

Hyeong Seop Sim, Kee-Eung Kim, Jin Hyung Kim, Du-Seong Chang, and Myoung-Wan Koo: Symbolic Heuristic Search Value Iteration for Factored POMDPs. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2008. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We propose Symbolic heuristic search value iteration (Symbolic HSVI) algorithm, which extends the heuristic search value iteration (HSVI) algorithm in order to handle factored partially observable Markov decision processes (factored POMDPs). The idea is to use algebraic decision diagrams (ADDs) for compactly representing the problem itself and all the relevant intermediate computation results in the algorithm. We leverage Symbolic Perseus for computing the lower bound of the optimal value function using ADD operators, and provide a novel ADD-based procedure for computing the upper bound. Experiments on a number of standard factored POMDP problems show that we can achieve an order of magnitude improvement in performance over previously proposed algorithms.

Kee-Eung Kim: Exploiting Symmetries in POMDPs for Point-Based Algorithms. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2008. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

We extend the model minimization technique for partially observable Markov decision processes (POMDPs) to handle symmetries in the joint space of states, actions, and observations. The POMDP symmetry we define in this paper cannot be handled by the model minimization techniques previously published in the literature. We formulate the problem of finding the symmetries as a graph automorphism (GA) problem, and although not yet known to be tractable, we experimentally show that the sparseness of the graph representing the POMDP allows us to quickly find symmetries. We show how the symmetries in POMDPs can be exploited for speeding up point-based algorithms. We experimentally demonstrate the effectiveness of our approach
2007

Jihoon Kim, Taik Heon Rhee, Kee-Eung Kim, and Jin Hyung Kim: Place Recognition Using Multiple Wearable Cameras. Proc. of 4th International Symposium on Ubiquitous Computing Systems (UCS) / Lecture Notes in Computer Science (LNCS) 4836. 2007. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Recognizing a user's location is the most challenging problem for providing intelligent location-based services. In this paper, we presented a realtime camera-based system for the place recognition problem. This system takes streams of scene images of a learned environment from user-worn cameras and produces the class label of the current place as an output. Multiple cameras are used to collect multi-directional scene images because utilizing multiple images yields better and robust recognition than a single image. For more robust recognition, we utilized spatial relationships between the places. In addition that, a temporal reasoning is incorporated with a Markov model to reflect typical staying time at each place. Recognition experiments, which were conducted in a real environment in a university campus, showed that the proposed method yields a very promising result.

Jihoon Kim, Taik Heon Rhee, Kee-Eung Kim, and Jin Hyung Kim: Signboard Recognition by Consistency Checking of Local Features. 2nd Korea-Japan Joint Workshop on Pattern Recognition (KJPR). 2007. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

The problem of recognizing signboards in street scenes is defined as matching the input image to pre-stored 2D signboard images. This problem is not as simple as it appears to be due to arbitrary drawings and relative 3D positions. We approached this problem by matching characteristic local features of input image to those of images in the database. Local decisions are verified by the global viewpoint of the homographic consistency and color consistency. The well-known SIFT feature is used as a local feature and the homographic consistency checking is performed using RANSAC, a random sampling method. In order to handle highly perspective-distorted signboards, several perspective-transformed templates are generated offline. In our experiment, with a database of 35 images, our proposed method achieved 95% recognition rate, showing good results despite the highly distorted input images.
2006

Kee-Eung Kim, Wook Chang, Sung-Jung Cho, Junghyun Shim, Hyunjeong Lee, Joonah Park, Youngbeom Lee, and Sangryoung Kim: Hand Grip Pattern Recognition for Mobile User Interfaces. Proceedings of the Innovative Applications of Artificial Intelligence Conference (IAAI). 2006. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

This paper presents a novel user interface for handheld mobile devices by recognizing hand grip patterns. Particularly, we consider the scenario where the device is provided with an array of capacitive touch sensors underneath the exterior cover. In order to provide the users with intuitive and natural manipulation experience, we use pattern recognition techniques for identifying the users' hand grips from the touch sensors. Preliminary user studies suggest that filtering out unintended user hand grip is one of the most important issues to be resolved. We discuss the details of the prototype implementation, as well as engineering challenges for practical deployment.

Wook Chang, Kee-Eung Kim, Hyunjeong Lee, Joon Kee Cho, Byung Seok Soh, Jung Hyun Shim, Gyunghye Yang, Sung-Jung Cho, and Joonah Park: Recognition of Grip-Patterns by using Capacitive Touch Sensors. Proceedings of the IEEE International Symposium on Industrial Electronics (ISIE). 2006. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

A novel and intuitive way of accessing applications of mobile devices is presented. The key idea is to use grip-pattern, which is naturally produced when a user tries to use the mobile device, as a clue to determine an application to be launched. To this end, a capacitive touch sensor system is carefully designed and installed underneath the housing of the mobile device to capture the information of the user's grip-pattern. The captured data is then recognized by a minimum distance classifier and a naive Bayes classifier. The recognition test is performed to validate the feasibility of the proposed user interface system.

Kee-Eung Kim, Taeseo Park, Min-Kyu Park, Youngbeom Lee, Yunbae Kim, and Sangryoung Kim: Adaptive Event Clustering for Personalized Photo Browsing. ํ•œ๊ตญ HCI ํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘ (Proceedings of Korean HCI Conference). 2006. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Since the introduction of digital camera to the mass market, the number of digital photos owned by an individual is growing at an alarming rate. This phenomenon naturally leads to the issues of difficulties while searching and browsing in the personal digital photo archive. Traditional approach typically involves content-based image retrieval using computer vision algorithms. However, due to the performance limitations of these algorithms, at least on the casual digital photos taken by non-professional photographers, more recent approaches are centered on time-based clustering algorithms, analyzing the shot times of photos. These time-based clustering algorithms are based on the insight that when these photos are clustered according to the shot-time similarity, we have โ€œevent clustersโ€ that will help the user browse through her photo archive. It is also reported that one of the remaining problems with the time-based approach is that people perceive events in different scales. In this paper, we present an adaptive time-based clustering algorithm that exploits the usage history of digital photos in order to infer the user's preference on the event granularity. Experiments show significant performance improvements in the clustering accuracy.

Wook Chang, Kee-Eung Kim, Hyunjeong Lee, Joonki Cho, Byeongsuk Soh, Junghyun Shim, Kyunghye Yang, Sung-Jung Cho, and Junah Park: Designing Mobile User Interfaces Using Hand Grip Recognition. ํ•œ๊ตญ HCI ํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘ (Proceedings of Korean HCI Conference). 2006.

2005

SeongHwan Cho and Kee-Eung Kim: Variable Bandwidth Allocation Scheme for Energy Efficient Wireless Sensor Network. Proceedings of the IEEE International Conference on Communications (ICC). 2005. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

Increasing the lifetime of wireless sensors is essential for the proliferation of wireless sensor networks in various environments. In this paper, the relationship between bandwidth and energy consumption is exploited to increase the lifetime of the sensors. A variable bandwidth allocation scheme that uses time-frequency slot assignment is proposed to reduce the energy consumption of a collaborative sensor network which has large spatial variation in node density and event rates. To assign the time-frequency slots to the sensor network, a novel algorithm is presented, which results in significant energy savings over the conventional constant bandwidth allocation scheme.

Wook Chang, Juna Park, Kee-Eung Kim, Sung-Jung Cho, Hyun-Jung Lee, and Junghyun Shim: ์ ‘์ด‰ ์„ผ์„œ๋ฅผ ์ด์šฉํ•œ ์‚ฌ์šฉ์ž ์ธํ„ฐํŽ˜์ด์Šค ์„ค๊ณ„ (Designing a Touch-based User Interface System for Handheld Devices). ํ•œ๊ตญ HCI ํ•™์ˆ ๋Œ€ํšŒ ๋…ผ๋ฌธ์ง‘ (Proceedings of Korean HCI Conference). 2005. [๐Ÿ“„ Abstract] [โœ๏ธ Paper]

This paper proposes a new interaction system for portable devices that combines two different types of sensors: a set of capacitive touch sensors and an accelerometer. The touch sensing system of this device can detect multiple finger-touches and finger proximity to the surface, while traditional touch sensing systems such as touchpad usually focus on recognizing the position of a single finger. In addition, a tri-axis accelerometer is applied to measure the motion information such as the inclination angle and vibration of the system caused by a user. Combining multi-finger touch and motion information, the proposed system provides users with a game-like experience by enhancing contextual navigation and realistic manipulation.
2003

Kee-Eung Kim and Thomas Dean: Solving Factored MDPs Using Non-Homogeneous Partitions. Artificial Intelligence, 147(1-2). 2003.

2002

Kee-Eung Kim and Thomas Dean: Solving Factored MDPs with Large Action Space Using Algebraic Decision Diagrams. Proceedings of Pacific-Rim Conference on Artificial Intelligence (PRICAI) / Lecture Notes in Computer Science (LNCS) 2417. 2002.

2001

Kee-Eung Kim and Thomas Dean: Solving Factored MDPs via Non-homogeneous Partitioning. Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI). 2001.

Nicolas Meuleau, Leonid Peshkin, and Kee-Eung Kim: Exploration in Gradient-based Reinforcement Learning. MIT, AI Memo(2001-003). 2001.

2000

Kee-Eung Kim, Thomas Dean, and Nicolas Meuleau: Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers. Proceedings of the Fifth International Conference on Artificial Intelligence in Planning and Scheduling (AIPS). 2000.

Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, and Leslie Pack Kaelbling: Learning to Cooperate via Policy Search. Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI). 2000.

Kee-Eung Kim, Thomas Dean, and Samuel Hazlehurst: Linear Algebra in Very High-Dimension Vector Spaces With an Application to Solving Markov Decision Processes. Neural Computing Surveys, 3. 2000.

Kee-Eung Kim, Thomas Dean, and Samuel Hazlehurst: Linear Algebra in Very High-Dimension Vector Spaces: Algorithms and Data Structures for Implementing Exact and Approximate Solution Methods. Department of Computer Science, Brown University, Technical Report(CS-00-02). 2000.

1999

Thomas Dean, Kee-Eung Kim, and Samuel Hazlehurst: Linear Algebra in Very High-Dimension Vector Spaces With an Application to Solving Markov Decision Processes. Proceedings of IJCAI-99 Workshop on Statistical Machine Learning for Large-Scale Optimization. 1999.

Nicolas Meuleau, Leonid Peshkin, Kee-Eung Kim, and Leslie Pack Kaelbling: Learning Finite-State Controllers for Partially Observable Environments. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI). 1999.

Nicolas Meuleau, Kee-Eung Kim, Leslie Pack Kaelbling, and Anthony R. Cassandra: Solving POMDPs by Searching the Space of Finite Policies. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI). 1999.

1998

Nicolas Meuleau, Milos Hauskrecht, Kee-Eung Kim, Leonid Peshkin, Leslie Pack Kaelbling, Thomas Dean, and Craig Boutilier: Solving Very Large Weakly Coupled Markov Decision Processes. Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI). 1998.

Thomas Dean, Kee-Eung Kim, and Robert Givan: Solving Planning Problems with Large State and Action Spaces. Proceedings of the Fourth International Conference on Artificial Intelligence Planning Systems (AIPS). 1998.