Deepfake and Synthetic Audio Detection

Report on Current Developments in the Deepfake and Synthetic Audio Detection Field

General Direction of the Field

The field of deepfake and synthetic audio detection is rapidly evolving, driven by the increasing sophistication of generative models and the growing concerns over their misuse. Recent research is focusing on developing more robust and privacy-preserving detection methods, as well as creating new datasets to facilitate the training and evaluation of these models. The general direction of the field can be summarized in three key areas:

  1. Enhanced Detection Techniques: There is a strong emphasis on developing detection methods that can effectively counter the latest advancements in text-to-speech (TTS) and voice conversion (VC) models. This includes the use of active probing techniques and the integration of probabilistic attribute embeddings to improve the interpretability and accuracy of detection models.

  2. Privacy-Preserving Detection: A significant trend is the development of frameworks that can detect deepfake audio without accessing or exposing the content of the speech. This is particularly important for applications involving sensitive information, where traditional detection methods may be impractical due to privacy concerns.

  3. Dataset Creation and Benchmarking: The creation of new datasets, such as those based on advanced TTS models and synthetic misinformation, is crucial for advancing the field. These datasets provide a basis for evaluating the robustness of detection models and for developing new techniques that can generalize across different types of synthetic audio.

Noteworthy Papers

  • DFADD: Introduces a novel dataset for evaluating anti-spoofing models against advanced TTS systems, highlighting the need for more resilient detection techniques.
  • SafeEar: Proposes a privacy-preserving deepfake detection framework that effectively shields speech content from exposure, demonstrating significant advancements in privacy-aware detection.
  • SFake: Presents a real-time deepfake detection method using active probes, outperforming existing methods in accuracy and efficiency.
  • WMCodec: Introduces an end-to-end neural speech codec with deep watermarking, significantly improving watermark imperceptibility and extraction accuracy.

Sources

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset

SafeEar: Content Privacy-Preserving Audio Deepfake Detection

FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models

Shaking the Fake: Detecting Deepfake Videos in Real Time via Active Probes

An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization

SpMis: An Investigation of Synthetic Spoken Misinformation Detection

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification

Built with on top of