Model Collapseedit

Concept page for degenerative distributional drift in recursive model training.

Model Collapse is a degenerative process in which a model trained recursively on generated or biased data loses information about the original data distribution. Collapse can appear as mode loss, reduced diversity, distorted class proportions, or worsening sample quality over generations.¹

Role in this wikiedit

The page provides the failure concept for the broader Synthetic Data topic. It is written separately because "synthetic data" is not automatically bad: the failure depends on how generated data are selected, mixed, and reused. Model collapse is therefore the negative endpoint that motivates careful data governance and collaborative verification.

Connection to Qiao's workedit

Qiao's ICML 2026 paper studies when sample-selection bias precipitates collapse. The work is connected to Wasserstein geometry because distributional distances can provide signals about drift, and to data silos because no single party may have the full distribution. In the biography, model collapse is part of Qiao's broader reliability agenda: data processes can silently degrade models even when the model architecture remains unchanged.

Footnotesedit

Shumailov et al. define model collapse in the context of recursively generated data and report the phenomenon across language models, variational autoencoders, and Gaussian mixture models. ↩

Model Collapseedit

Role in this wikiedit

Connection to Qiao's workedit

See alsoedit

Footnotesedit