Advancements in Machine Learning Security: Backdoor Attacks and Defenses

The field of machine learning security is rapidly evolving, with a significant focus on the vulnerabilities of models to backdoor attacks and the development of innovative defense mechanisms. Recent research highlights the increasing sophistication of backdoor attacks, which now aim not only to compromise model integrity but also to inject specific biases or disrupt model extraction attempts. These attacks exploit the inherent vulnerabilities of large-scale models, such as BERT and RoBERTa, demonstrating the need for robust defense strategies. On the defense front, novel approaches like Repulsive Visual Prompt Tuning (RVPT) and Greedy Module Substitution (GMS) are emerging, offering effective countermeasures by targeting the root causes of model vulnerabilities without the need for extensive retraining or access to clean datasets. Additionally, the concept of using backdoor attacks as a defense mechanism against model extraction introduces a new paradigm in the ongoing battle between attackers and defenders, highlighting the dynamic and adversarial nature of this research area.

Noteworthy Papers

  • Injecting Bias into Text Classification Models using Backdoor Attacks: Demonstrates the potential of backdoor attacks to inject specific biases into text classification models, with a focus on the stealthiness and effectiveness of such attacks on modern transformer-based models.
  • CL-attack: Textual Backdoor Attacks via Cross-Lingual Triggers: Introduces a novel backdoor attack method using cross-lingual triggers, showcasing its high success rate and robustness against existing defense mechanisms.
  • Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning: Presents RVPT, a defense strategy that significantly reduces the attack success rate by eliminating excessive class-irrelevant features in multimodal models.
  • Cut the Deadwood Out: Post-Training Model Purification with Selective Module Substitution: Proposes GMS, a method for purifying backdoored models by substituting critical components, demonstrating strong effectiveness against challenging attacks.
  • HoneypotNet: Backdoor Attacks Against Model Extraction: Introduces a novel defense paradigm that uses backdoor attacks to deter model extraction, effectively disrupting the functionality of substitute models.

Sources

Injecting Bias into Text Classification Models using Backdoor Attacks

CL-attack: Textual Backdoor Attacks via Cross-Lingual Triggers

Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Cut the Deadwood Out: Post-Training Model Purification with Selective Module Substitution

HoneypotNet: Backdoor Attacks Against Model Extraction

A Game Between the Defender and the Attacker for Trigger-based Black-box Model Watermarking

Stealthy Backdoor Attack to Real-world Models in Android Apps

Built with on top of