Scalable Algorithms and Efficient Optimization in Data Science

The recent developments in the research area have seen significant advancements in scalable and efficient algorithms for various data science applications. There is a notable trend towards integrating large language models with temporal point processes for tasks such as temporal event sequence retrieval and temporal fact verification. Additionally, decentralized and federated optimization methods are being refined to handle multi-objective problems with reduced communication overhead, addressing the challenges of hyperparameter tuning and meta-learning in distributed settings. The field is also witnessing innovations in Bayesian optimization, with methods that incorporate known invariances to improve sample efficiency. Furthermore, there is a growing focus on scalable latent variable models and efficient indexing methods for large datasets, particularly in healthcare applications. These advancements collectively push the boundaries of computational efficiency and model accuracy, making sophisticated techniques more accessible for large-scale and real-world applications.

Noteworthy papers include one that introduces a unified model for efficiently embedding and retrieving event sequences based on natural language descriptions, demonstrating superior performance across diverse datasets. Another paper presents a fully first-order decentralized method for bilevel optimization, which is both compute- and communicate-efficient, validated through experiments on hyperparameter tuning tasks. Lastly, a paper on scalable latent variable models introduces a novel variational Bayesian inference algorithm, showing scalability and superior performance in generating informative latent representations.

Sources

Efficient Retrieval of Temporal Event Sequences from Textual Descriptions

A Communication and Computation Efficient Fully First-order Method for Decentralized Bilevel Optimization

A class of kernel-based scalable algorithms for data science

Unscrambling disease progression at scale: fast inference of event permutations with optimal transport

ChronoFact: Timeline-based Temporal Fact Verification

Distributed Thompson sampling under constrained communication

Solving Sparse \& High-Dimensional-Output Regression via Compression

Nonlinear Bayesian Filtering with Natural Gradient Gaussian Approximation

A Trust-Region Method for Graphical Stein Variational Inference

Federated Communication-Efficient Multi-Objective Optimization

Hyperboloid GPLVM for Discovering Continuous Hierarchies via Nonparametric Estimation

Sample-efficient Bayesian Optimisation Using Known Invariances

TELII: Temporal Event Level Inverted Indexing for Cohort Discovery on a Large Covid-19 EHR Dataset

Cooperative Multi-Agent Constrained Stochastic Linear Bandits

Scalable Random Feature Latent Variable Models

Efficient Adaptive Federated Optimization

Built with on top of