The field of speech and image synthesis is rapidly advancing, with a focus on improving adversarial robustness and developing effective watermarking techniques. Recent research has explored the use of generative adversarial networks (GANs) and optimal transport theory to improve the naturalness of generated speech samples. Additionally, researchers have proposed various watermarking methods, including temporal-aware robust watermarking and low-rank adaptation, to protect speech synthesis models from unauthorized use. In the area of image synthesis, frequency-domain learning with kernel prior has shown promise in improving the generalization capabilities of deep learning methods for image deblurring. Furthermore, advancements in deepfake detection have led to the development of spatial-frequency collaborative learning and hierarchical cross-modal fusion methods. Notable papers in this area include the proposal of a Collective Learning Mechanism-based Optimal Transport GAN model, which achieves state-of-the-art results in voice conversion tasks, and the introduction of a novel generative watermarking method called SOLIDO, which ensures high-fidelity watermarked speech and achieves high extraction accuracy against common attacks.
Advances in Adversarial Robustness and Watermarking for Speech and Image Synthesis
Sources
Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion
Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos