The field of machine learning and causal discovery is witnessing significant advancements in evaluation metrics and benchmarking. Researchers are focusing on developing more accurate and reliable methods for evaluating the performance of stochastic optimizers, causal discovery algorithms, and natural language processing systems. One of the key directions is the development of new metrics and benchmarks that can provide more meaningful comparisons across different areas and tasks. For instance, the concept of ICLR points has been introduced to quantify the average effort required to produce a publication at top-tier machine learning conferences. Another important area of research is the development of more effective methods for causal discovery, including the use of continuous integration inspired workflows for benchmarking and the creation of large-scale benchmarking kits such as CausalRivers. Additionally, researchers are exploring new approaches to meta-evaluation, such as contextual metric meta-evaluation, which compares the local metric accuracy of evaluation metrics in highly contextual settings. Noteworthy papers include: A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers, which presents a statistical analysis for evaluating the performance of stochastic optimizers and provides guidelines for experiment design. CausalRivers, which introduces a large-scale benchmarking kit for causal discovery in time-series data. ClusterSC, which proposes a clustering-based approach to synthetic control and provides theoretical guarantees and empirical demonstrations of its effectiveness.
Advancements in Evaluation Metrics and Causal Discovery
Sources
A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: How Many Repeats Are Enough?
Unitless Unrestricted Markov-Consistent SCM Generation: Better Benchmark Datasets for Causal Discovery
Employing Continuous Integration inspired workflows for benchmarking of scientific software -- a use case on numerical cut cell quadrature