arxiv:2405.13729

ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models

Published on Apr 29

· Submitted by

Rui Xu on May 5

University of Hong Kong

Upvote

Authors:

Abstract

Combining stochastic processes with diffusion models addresses combinatorial complexity limitations, accelerating training and enabling asynchronous generation across data modalities.

AI-generated summary

In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, additional attributes are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes can be insufficiently covered by existing training schemes of diffusion generative models, potentially limiting test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses asynchronous time steps for different dimensions and attributes, thus allowing for varying degrees of control over them. Our code is available at: https://github.com/Xrvitd/ComboStoc

View arXiv page View PDF Project page GitHub 34 Add to collection

Community

Xrvitd

Paper submitter 1 day ago

Today we're releasing ComboStoc, a simple new training strategy for diffusion generative models that unlocks faster training and more flexible control at test time.

Diffusion models usually treat each training sample as a point moving along a single path. But for high dimensional data, and especially for structured generation tasks with additional attributes, that is often not enough. Large parts of the underlying combinatorial space remain poorly sampled during training, which can hurt generation quality at test time.

ComboStoc addresses this with a simple idea: instead of training only along a narrow set of paths, it constructs stochastic processes that more fully explore the combinatorial structures induced by dimensions and attributes. This leads to much better coverage of the training space, with no need for a complicated redesign of the model itself.

The result is a diffusion framework that trains significantly faster across very different data modalities, including images and structured 3D shapes.

But ComboStoc is not only about speed.
It also enables a new test time generation scheme with asynchronous time steps across different dimensions or attributes. In practice, this means you can preserve some parts more strongly than others, or apply different levels of control across regions, components, or conditions within the same sample.

This opens up a new way to think about diffusion generation: not as one synchronized denoising process, but as a more flexible system where different parts of the data can evolve at different rates.

We think this perspective is especially promising for structured generation problems, where dimensions and attributes are deeply entangled and should not always be treated uniformly.

Faster training. Better path coverage. More controllable generation.
ComboStoc is a simple step toward diffusion models that make fuller use of the structure already present in the data.

🌐 Project Page: https://ruixu.me/html/ComboStoc/index.html

📄 Paper: https://arxiv.org/abs/2405.13729

💻 Code: https://github.com/Xrvitd/ComboStoc