Sequential Decision Making and Optimization Research

Report on Recent Developments in Sequential Decision Making and Optimization Research

General Direction of the Field

The field of sequential decision making and optimization has seen significant advancements, particularly in addressing complex and large-scale problems. Recent research has focused on developing innovative algorithms that are both efficient and effective, even in the presence of unbounded contexts, global environmental shifts, and interference in decision-making processes. The emphasis has been on creating methods that are sample-optimal, consistent, and capable of handling the curse of dimensionality, which is a common challenge in these domains.

One of the key trends in the field is the development of algorithms that can operate in non-stationary environments. This is evident in the work on non-stationary stochastic bandits and the identification of the best arm under global environmental shifts. These advancements are crucial for real-world applications where environmental conditions are not static and can significantly impact the performance of decision-making systems.

Another significant trend is the exploration of off-policy evaluation and learning in more complex settings, such as contextual combinatorial bandits. Researchers are developing methods that can effectively decompose problems into manageable subproblems, thereby reducing bias and variance in the estimation process. This is particularly important in fields like recommender systems and healthcare, where the stakes are high and the need for accurate evaluations is paramount.

Noteworthy Papers

  • Sample-Optimal Large-Scale Optimal Subset Selection: Introduces a novel top-$m$ greedy selection mechanism that is both sample optimal and consistent, providing deeper insights for decision-makers.
  • GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits: Proposes a three-timescale stochastic approximation algorithm that learns near-optimal policies for non-indexable RMABs, demonstrating significant convergence improvements over existing baselines.
  • IntOPE: Off-Policy Evaluation in the Presence of Interference: Develops an IPW-style estimator that accounts for interference in decision-making processes, showcasing its effectiveness through extensive experiments on both synthetic and real-world data.

These papers represent significant strides in the field, offering innovative solutions to complex problems and advancing the state-of-the-art in sequential decision making and optimization.

Sources

Sample-Optimal Large-Scale Optimal Subset Selection

GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits

Contextual Bandits for Unbounded Context Distributions

Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits

The Vizier Gaussian Process Bandit Algorithm

Identifying the Best Arm in the Presence of Global Environment Shifts

IntOPE: Off-Policy Evaluation in the Presence of Interference

Built with on top of