The recent publications in the field of online learning and decision-making under uncertainty highlight a significant shift towards addressing more complex, real-world scenarios. These include non-stationary environments, multi-objective optimization, and the integration of structural knowledge into learning algorithms to improve efficiency and performance. A common theme across these studies is the development of algorithms that not only achieve optimal or near-optimal performance but also adapt to the inherent uncertainties and dynamics of the environments they operate in. This includes advancements in regret minimization, where new algorithms are proposed to achieve tighter bounds under various feedback models and constraints. Furthermore, there's a notable emphasis on the practical applicability of these algorithms, with several papers validating their approaches through simulations and real-world datasets.
Noteworthy Papers
- Improved learning rates in multi-unit uniform price auctions: Introduces a novel modeling of the bid space, achieving a tighter regret bound under bandit feedback and proposing a new feedback model that interpolates between full-information and bandit scenarios.
- Logarithmic Regret for Nonlinear Control: Demonstrates that fast sequential learning with logarithmic regret is achievable in controlling nonlinear dynamical systems, provided the optimal control policy is persistently exciting.
- Pairwise Elimination with Instance-Dependent Guarantees for Bandits with Cost Subsidy: Introduces the Pairwise-Elimination algorithm and its generalization, PE-CS, for the MAB-CS framework, achieving order-wise logarithmic upper bounds on Cost and Quality Regret.
- Sequential Change Detection for Learning in Piecewise Stationary Bandit Environments: Proposes tests for change detection in piecewise stationary bandits that are order optimal in terms of horizon and satisfy desirable properties for regret analysis.
- Online Clustering with Bandit Information: Introduces ATBOC and LUCBBOC algorithms for online clustering within the multi-armed bandit framework, proving order optimality and demonstrating effectiveness through numerical experiments.
- Multi-Stage Active Sequential Hypothesis Testing with Clustered Hypotheses: Proposes a deterministic and adaptive multi-stage hypothesis-elimination algorithm that achieves vanishing Average Bayes Risk as the error probability approaches zero.
- The regret lower bound for communicating Markov Decision Processes: Extends the regret lower bound beyond ergodic MDPs, introducing the concept of co-exploration and providing an algorithm to approximate the lower bound.
- Learning to Bid in Non-Stationary Repeated First-Price Auctions: Addresses the complexity of determining an optimal bidding strategy in first-price auctions under non-stationary conditions, introducing metrics to quantify the regularity of the bidding sequence.
- Minimizing Queue Length Regret for Arbitrarily Varying Channels: Proposes a weakly adaptive MAB algorithm for minimizing queue length regret in the presence of non-stationary and adversarially controlled channel rates.
- Optimal Multi-Objective Best Arm Identification with Fixed Confidence: Investigates the multi-objective best arm identification problem, proposing an algorithm that uses surrogate proportions to sample arms and is shown to be asymptotically optimal.
- Revisiting Online Learning Approach to Inverse Linear Optimization: Provides a new perspective on the online learning approach to inverse linear optimization through the lens of Fenchel--Young losses, achieving a faster rate than the standard regret bound.