The field of medical data generation is moving towards more sophisticated and targeted approaches, with a focus on improving the quality and usefulness of synthetic data for downstream clinical models. Researchers are exploring novel generative models and frameworks that can capture complex dependencies and relationships in medical data, and optimize synthetic samples for specific clinical tasks. One of the key challenges in this area is ensuring that synthetic data can generalize across different healthcare settings and populations, and several studies have highlighted the importance of preserving realistic distributions and correlations in synthetic data. Noteworthy papers in this area include: Auto-FEDUS, which introduces a novel autoregressive generative model for mapping fetal electrocardiogram signals to corresponding Doppler ultrasound waveforms, and demonstrates state-of-the-art performance in generating realistic synthetic signals. TarDiff, which proposes a target-oriented diffusion framework for generating synthetic Electronic Health Record time-series data, and achieves significant improvements in downstream model performance compared to existing methods.
Advances in Synthetic Medical Data Generation
Sources
Auto-FEDUS: Autoregressive Generative Modeling of Doppler Ultrasound Signals from Fetal Electrocardiograms
A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs