We study generative models for structured data such as tables and time series. Our work spans Generative Adversarial Networks (GANs), Large Language Models (LLMs) and Diffusion models, focusing on synthetic data quality, downstream utility, and privacy. We also explore federated and decentralized settings, which commonly arise in real-world collaborative scenarios.

–

Publications

–

WaveStitch: Flexible and Fast Conditional Time Series Generation with Diffusion Models

Aditya Shankar, Lydia Y. Chen, Arie van Deursen and Rihan Hai

ACM SIGMOD 2026: πŸ“„ Paper|

Citation
@article{shankar2025wavestitch,
    author  = {Aditya Shankar and
               Lydia Y. Chen and
               Arie van Deursen and
               Rihan Hai},
    title   = {WaveStitch: Flexible and Fast Conditional Time Series Generation with Diffusion Models},
    journal = {CoRR},
    volume  = {abs/2503.06231},
    year    = {2025}
}

Federated Time Series Generation on Feature and Temporally MisalignedData

Zhi Wen Soi, Chenrui Fan, Aditya Shankar, Abel Malan, Lydia Y. Chen

ECML PKDD 2025: πŸ“„ Paper|πŸ’» Code

Citation
@inproceedings{soi2025fedtdd,
    author      = {Zhi Wen Soi and
                   Chenrui Fan and
                   Aditya Shankar and
                   Abele Mălan and
                   Lydia Y. Chen},
    title       = {Federated Time Series Generation on Feature and Temporally Misaligned Data},
    booktitle   = {Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, {ECML} {PKDD} 2025},
    year        = {2025}
}

TabuLa: Harnessing Language Models for Tabular Data Synthesis

Zilong Zhao, Robert Birke, and Lydia Y. Chen

PAKDD 2025: πŸ“„ Paper|πŸ’» Code

Citation
@inproceedings{zhao2025stv,
    author      = {Zilong Zhao and
                   Robert Birke and
                   Lydia Y. Chen},
    title       = {TabuLa: Harnessing Language Models for Tabular Data Synthesis},
    booktitle   = {Advances in Knowledge Discovery and Data Mining - 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, {PAKDD} 2025},
    series      = {Lecture Notes in Computer Science},
    publisher   = {Springer},
    year        = {2025},
    doi         = {10.1007/978-981-96-8186-0\_20}
}

GTV: Generating Tabular Data via Vertical Federated Learning

Zilong Zhao, Han Wu, Aad van Moorsel, Lydia Y. Chen

DSN 2025: πŸ“„ Paper|πŸ’» Code

Citation
@inproceedings{zhao2025gtv,
  title={Gtv: Generating tabular data via vertical federated learning},
  author={Zhao, Zilong and Wu, Han and Van Moorsel, Aad and Chen, Lydia Y},
  booktitle={2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)},
  pages={33--46},
  year={2025},
  organization={IEEE}
}

SiloFuse: Cross-Silo Synthetic Data Generation with Latent Tabular Diffusion Models

Aditya Shankar, Hans Brouwer, Rihan Hai, Lydia Y. Chen

ICDE 2024: πŸ“„ Paper

Citation
@INPROCEEDINGS{10597707,
  author={Shankar, Aditya and Brouwer, Hans and Hai, Rihan and Chen, Lydia},
  booktitle={2024 IEEE 40th International Conference on Data Engineering (ICDE)}, 
  title={SiloFuse: Cross-silo Synthetic Data Generation with Latent Tabular Diffusion Models}, 
  year={2024},
  volume={},
  number={},
  pages={110-123},
  keywords={Training;Resistance;Privacy;Data privacy;Costs;Synthesizers;Memory;Encoding;Task analysis;Synthetic data;Distributed databases;Synthetic data;Data privacy;Distributed training},
  doi={10.1109/ICDE60146.2024.00016}}

CTAB-GAN: Effective Table Data Synthesizing

Zilong Zhao, Aditya Kunar, Robert Birke, Lydia Y. Chen

ACML 2021: πŸ“„ Paper|πŸ’» Code

Citation
@inproceedings{zhao2021ctab,
  title={Ctab-gan: Effective table data synthesizing},
  author={Zhao, Zilong and Kunar, Aditya and Birke, Robert and Chen, Lydia Y},
  booktitle={Asian conference on machine learning},
  pages={97--112},
  year={2021},
  organization={PMLR}
}