Advances in Bandit Algorithms and Decision-Making Models

The field of bandit algorithms and decision-making models is witnessing significant developments, with a focus on improving the trade-off between exploration and exploitation. Researchers are exploring new approaches to model bounded rational decision-making, including the use of Wasserstein distances, which can effectively capture the nearness of ordinal actions and overcome issues with existing methods. Another area of innovation is the development of meta-learning bandit algorithms, which can learn fast and interpretable exploration plans for a fixed collection of bandits. Furthermore, there is a growing interest in robust online decision-making models, which can generalize classical bandits and reinforcement learning by allowing for robust, multivalued models. Notable papers in this area include:

  • A paper proposing an exploration-free method for linear stochastic bandits driven by a linear Gaussian dynamical system, which utilizes Kalman filter predictions to select actions.
  • A paper introducing a classification view on meta-learning bandits, which achieves a test regret that scales with O(λ^−2 C_λ(M) log^2(MH)).
  • A paper deriving regret bounds for robust online decision-making, which generalizes decision-making with structured observations by allowing robust models.

Sources

An Exploration-free Method for a Linear Stochastic Bandit Driven by a Linear Gaussian Dynamical System

Modelling bounded rational decision-making through Wasserstein constraints

A Classification View on Meta Learning Bandits

Regret Bounds for Robust Online Decision Making

Built with on top of