Advances in Audio and Image Security

The field of audio and image security is rapidly evolving, with a focus on developing innovative methods to detect and prevent threats such as deepfakes, adversarial attacks, and watermarking. Recent research has explored the use of machine learning and deep learning techniques to improve the robustness of audio and image classification models. One notable direction is the development of anomaly detection frameworks that can identify out-of-distribution samples, such as speech deepfakes. Additionally, there is a growing interest in designing responsible AI systems that prioritize transparency, explainability, and fairness. Noteworthy papers in this area include CAARMA, which introduces a class augmentation framework to improve speaker verification, and SITA, which proposes a structurally imperceptible and transferable adversarial attack method for stylized image generation. Furthermore, the Imperceptible but Forgeable paper highlights the vulnerability of existing watermarking schemes to forgery attacks, emphasizing the need for more robust security measures.Overall, the field is moving towards developing more sophisticated and robust methods to address the increasingly complex threats in audio and image security.

Sources

CAARMA: Class Augmentation with Adversarial Mixup Regularization

Practical Acoustic Eavesdropping On Typed Passphrases

Measuring the Robustness of Audio Deepfake Detectors

Anomaly Detection and Localization for Speech Deepfakes via Feature Pyramid Matching

Adoption of Watermarking for Generative AI Systems in Practice and Implications under the new EU AI Act

Hiding Images in Diffusion Models by Editing Learned Score Functions

Towards Responsible AI Music: an Investigation of Trustworthy Features for Creative Systems

SoK: How Robust is Audio Watermarking in Generative AI models?

SparSamp: Efficient Provably Secure Steganography Based on Sparse Sampling

Towards Imperceptible Adversarial Attacks for Time Series Classification with Local Perturbations and Frequency Analysis

Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization

SITA: Structurally Imperceptible and Transferable Adversarial Attacks for Stylized Image Generation

Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations

Imperceptible but Forgeable: Practical Invisible Watermark Forgery via Diffusion Models

Cross-Technology Generalization in Synthesized Speech Detection: Evaluating AST Models with Modern Voice Generators