ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models
Abstract
Combining stochastic processes with diffusion models addresses combinatorial complexity limitations, accelerating training and enabling asynchronous generation across data modalities.
In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, additional attributes are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes can be insufficiently covered by existing training schemes of diffusion generative models, potentially limiting test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses asynchronous time steps for different dimensions and attributes, thus allowing for varying degrees of control over them. Our code is available at: https://github.com/Xrvitd/ComboStoc
Community
Today we're releasing ComboStoc, a simple new training strategy for diffusion generative models that unlocks faster training and more flexible control at test time.
Diffusion models usually treat each training sample as a point moving along a single path. But for high dimensional data, and especially for structured generation tasks with additional attributes, that is often not enough. Large parts of the underlying combinatorial space remain poorly sampled during training, which can hurt generation quality at test time.
ComboStoc addresses this with a simple idea: instead of training only along a narrow set of paths, it constructs stochastic processes that more fully explore the combinatorial structures induced by dimensions and attributes. This leads to much better coverage of the training space, with no need for a complicated redesign of the model itself.
The result is a diffusion framework that trains significantly faster across very different data modalities, including images and structured 3D shapes.
But ComboStoc is not only about speed.
It also enables a new test time generation scheme with asynchronous time steps across different dimensions or attributes. In practice, this means you can preserve some parts more strongly than others, or apply different levels of control across regions, components, or conditions within the same sample.
This opens up a new way to think about diffusion generation: not as one synchronized denoising process, but as a more flexible system where different parts of the data can evolve at different rates.
We think this perspective is especially promising for structured generation problems, where dimensions and attributes are deeply entangled and should not always be treated uniformly.
Faster training. Better path coverage. More controllable generation.
ComboStoc is a simple step toward diffusion models that make fuller use of the structure already present in the data.
๐ Project Page: https://ruixu.me/html/ComboStoc/index.html
๐ Paper: https://arxiv.org/abs/2405.13729
๐ป Code: https://github.com/Xrvitd/ComboStoc
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Three Creates All: You Only Sample 3 Steps (2026)
- Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation (2026)
- Repurposing Geometric Foundation Models for Multi-view Diffusion (2026)
- Exploring Time Conditioning in Diffusion Generative Models from Disjoint Noisy Data Manifolds (2026)
- LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling (2026)
- Control-DINO: Feature Space Conditioning for Controllable Image-to-Video Diffusion (2026)
- Guiding a Diffusion Model by Swapping Its Tokens (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2405.13729 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper