Advancing Robustness and Interpretability in Machine Learning through Adversarial Attack Frameworks

The recent developments in the research area of adversarial attacks and robustness in machine learning models, particularly in the context of information retrieval and healthcare applications, highlight a growing concern for the security and interpretability of these systems. A significant trend is the exploration of black-box attack frameworks that leverage subtle, semantically consistent perturbations to evaluate and expose vulnerabilities in state-of-the-art models. These frameworks not only aim to degrade model performance but also provide insights into model interpretation and robustness, contributing to the development of more secure and explainable AI systems.

In the realm of healthcare, the focus has been on survival analysis models, where novel adversarial attack strategies are being developed to test the robustness of models predicting patient outcomes. These strategies employ clinically compatible perturbations to EHRs, offering a dual benefit of pre-deployment robustness testing and counterfactual clinical insights.

Similarly, in information retrieval, the vulnerability of neural ranking models and dense retrieval models to adversarial attacks is being rigorously examined. Innovative approaches are being proposed to enhance model robustness against search-engine optimization attacks, with a particular emphasis on improving the sensitivity of models to fine-grained relevance signals and exploring the dynamics of ranking manipulation attacks in LLM-based search engines.

Noteworthy Papers

  • SurvAttack: Introduces a black-box adversarial attack framework for survival models, leveraging clinically compatible EHR perturbations to evaluate model robustness and provide counterfactual insights.
  • Attack-in-the-Chain: Proposes a novel ranking attack framework that uses LLMs and CoT prompting to generate adversarial examples, demonstrating effectiveness in exposing vulnerabilities in neural ranking models.
  • Unsupervised dense retrieval with counterfactual contrastive learning: Enhances dense retrieval models' robustness and explainability through counterfactual regularization methods, showing improved sensitivity to fine-grained relevance signals.
  • GASLITEing the Retrieval: Presents a gradient-based search method for generating adversarial passages, significantly outperforming baselines in manipulating search results with minimal effort.
  • Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines: Analyzes the dynamics of ranking manipulation attacks in LLM-based search engines, providing theoretical and practical insights for mitigating vulnerabilities.

Sources

SurvAttack: Black-Box Attack On Survival Models through Ontology-Informed EHR Perturbation

Attack-in-the-Chain: Bootstrapping Large Language Models for Attacks Against Black-box Neural Ranking Models

Unsupervised dense retrieval with conterfactual contrastive learning

GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search

Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines

Built with on top of