Advances in Private Data Generation and Synthesis

The field of private data generation and synthesis is rapidly evolving, with a focus on developing innovative methods to ensure high fidelity, diversity, and efficiency in synthetic data generation. Recent developments have explored the use of generative adversarial networks (GANs), diffusion models, and federated learning to generate high-quality synthetic data while preserving privacy. Notably, researchers have proposed novel frameworks for synthesizing relational data, medical images, and brain tumors, which have shown promising results in terms of accuracy and privacy preservation. Another key area of research is the evaluation of synthetic data quality, with the development of benchmarking frameworks and metrics to assess the quality of synthetic data. Overall, the field is moving towards more sophisticated and specialized methods for private data generation and synthesis, with potential applications in healthcare, finance, and other sensitive domains. Noteworthy papers include: PrivPetal, which presents a novel approach to synthesizing relational data via permutation relations, significantly outperforming existing methods in terms of aggregate query accuracy. Federated Self-Supervised Learning for One-Shot Cross-Modal and Cross-Imaging Technique Segmentation, which explores a federated self-supervised one-shot segmentation task and achieves performance at par or better than the FedAvg version of the CoWPro framework on the held-out validation dataset. TAMIS, which proposes a novel membership inference attack against differentially-private synthetic data generation methods that rely on graphical models, achieving better or similar performance as MAMA-MIA on replicas of the SNAKE challenge. From Easy to Hard, which proposes a two-stage DP image synthesis framework that learns to generate DP synthetic images from easy to hard, resulting in 33.1% and 2.1% better fidelity and utility metrics than the state-of-the-art method. Beyond a Single Mode, which explores the use of GAN ensembles to overcome the limitations of single GAN models, achieving diverse synthetic medical images that are representative of true data distributions and computationally efficient. Few-Shot Generation of Brain Tumors, which proposes a decentralized few-shot generative model to synthesize brain tumor images while fully preserving privacy, achieving Dice score improvements of 3.9% for data augmentation and 4.6% for fairness on a separate dataset. Personalized Federated Training of Diffusion Models, which introduces a novel federated learning framework for training diffusion models on decentralized private datasets, outperforming non-collaborative training methods and effectively reducing biases and imbalances in synthetic data. Benchmarking Synthetic Tabular Data, which presents an evaluation framework that quantifies how well synthetic data replicates original distributional properties while ensuring privacy, supporting various data types and structures.

Sources

PrivPetal: Relational Data Synthesis via Permutation Relations

Federated Self-Supervised Learning for One-Shot Cross-Modal and Cross-Imaging Technique Segmentation

Beyond a Single Mode: GAN Ensembles for Diverse Medical Data Generation

Few-Shot Generation of Brain Tumors for Secure and Fair Data Sharing

Personalized Federated Training of Diffusion Models with Privacy Guarantees

TAMIS: Tailored Membership Inference Attacks on Synthetic Data

From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis

Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework

Built with on top of