Advances in Private Data Generation and Synthesis

The field of private data generation and synthesis is rapidly evolving, with a focus on developing innovative methods to ensure high fidelity, diversity, and efficiency in synthetic data generation. Recent developments have explored the use of generative adversarial networks (GANs), diffusion models, and federated learning to generate high-quality synthetic data while preserving privacy. Notably, researchers have proposed novel frameworks for synthesizing relational data, medical images, and brain tumors, which have shown promising results in terms of accuracy and privacy preservation. Another key area of research is the evaluation of synthetic data quality, with the development of benchmarking frameworks and metrics to assess the quality of synthetic data. Overall, the field is moving towards more sophisticated and specialized methods for private data generation and synthesis, with potential applications in healthcare, finance, and other sensitive domains. Noteworthy papers include: PrivPetal, which presents a novel approach to synthesizing relational data via permutation relations, significantly outperforming existing methods in terms of aggregate query accuracy. Federated Self-Supervised Learning for One-Shot Cross-Modal and Cross-Imaging Technique Segmentation, which explores a federated self-supervised one-shot segmentation task and achieves performance at par or better than the FedAvg version of the CoWPro framework on the held-out validation dataset. TAMIS, which proposes a novel membership inference attack against differentially-private synthetic data generation methods that rely on graphical models, achieving better or similar performance as MAMA-MIA on replicas of the SNAKE challenge. From Easy to Hard, which proposes a two-stage DP image synthesis framework that learns to generate DP synthetic images from easy to hard, resulting in 33.1% and 2.1% better fidelity and utility metrics than the state-of-the-art method. Beyond a Single Mode, which explores the use of GAN ensembles to overcome the limitations of single GAN models, achieving diverse synthetic medical images that are representative of true data distributions and computationally efficient. Few-Shot Generation of Brain Tumors, which proposes a decentralized few-shot generative model to synthesize brain tumor images while fully preserving privacy, achieving Dice score improvements of 3.9% for data augmentation and 4.6% for fairness on a separate dataset. Personalized Federated Training of Diffusion Models, which introduces a novel federated learning framework for training diffusion models on decentralized private datasets, outperforming non-collaborative training methods and effectively reducing biases and imbalances in synthetic data. Benchmarking Synthetic Tabular Data, which presents an evaluation framework that quantifies how well synthetic data replicates original distributional properties while ensuring privacy, supporting various data types and structures.

Advances in Private Data Generation and Synthesis

Sources