Pedro Ortega, Kee-Eung Kim, and Daniel Lee (2015)

# Reactive bandits with attitude

In: Proceedings of Artificial Intelligence and Statistics (AISTATS).

We consider a general class of K-armed bandits that adapt to the actions of the player. A single continuous parameter characterizes the "attitude" of the bandit, ranging from stochastic to cooperative or to fully adversarial in nature. The player seeks to maximize the expected return from the adaptive bandit, and the associated optimization problem is related to the free energy of a statistical mechanical system under an external field. When the underlying stochastic distribution is Gaussian, we derive an analytic solution for the long run optimal player strategy for different regimes of the bandit. In the fully adversarial limit, this solution is equivalent to the Nash equilibrium of a two-player, zero-sum semi-infinite game. We show how optimal strategies can be learned from sequential draws and reward observations in these adaptive bandits using Bayesian filtering and Thompson sampling. Results show the qualitative difference in policy regret between our proposed strategy and other well-known bandit algorithms.