The field of bandit algorithms and decision-making models is witnessing significant developments, with a focus on improving the trade-off between exploration and exploitation. Researchers are exploring new approaches to model bounded rational decision-making, including the use of Wasserstein distances, which can effectively capture the nearness of ordinal actions and overcome issues with existing methods. Another area of innovation is the development of meta-learning bandit algorithms, which can learn fast and interpretable exploration plans for a fixed collection of bandits. Furthermore, there is a growing interest in robust online decision-making models, which can generalize classical bandits and reinforcement learning by allowing for robust, multivalued models. Notable papers in this area include:
- A paper proposing an exploration-free method for linear stochastic bandits driven by a linear Gaussian dynamical system, which utilizes Kalman filter predictions to select actions.
- A paper introducing a classification view on meta-learning bandits, which achieves a test regret that scales with O(λ^−2 C_λ(M) log^2(MH)).
- A paper deriving regret bounds for robust online decision-making, which generalizes decision-making with structured observations by allowing robust models.