ChenDRAG
/

LLamaGen_GF

Model card Files Files and versions

xet

Community

Add comprehensive model card for Guidance-Free Training (GFT) with metadata

by nielsr HF Staff - opened Aug 26

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+59

-3

Files changed (1) hide show

README.md +59 -3

README.md CHANGED Viewed

@@ -1,3 +1,59 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: text-to-image
+library_name: diffusers
+---
+# Guidance-Free Training (GFT): Visual Generation Without Guidance
+This repository contains checkpoints and code for **Guidance-Free Training (GFT)**, a novel approach for visual generative models presented in the paper [Visual Generation Without Guidance](https://huggingface.co/papers/2501.15420). GFT aims to eliminate the need for Classifier-Free Guidance (CFG) during sampling, effectively halving the computational cost of inference while matching or surpassing CFG's performance.
+Unlike previous distillation-based approaches, GFT enables training directly from scratch and requires minimal modifications to existing codebases. It is a universal algorithm applicable across various visual generative models, including diffusion, autoregressive, and masked-prediction architectures.
+**Paper:** [Visual Generation Without Guidance](https://huggingface.co/papers/2501.15420)
+**GitHub Repository:** https://github.com/thu-ml/GFT
+<p align="center">
+  <img src="https://github.com/thu-ml/GFT/raw/main/GFT.png" alt="GFT comparison" style="width:80%;">
+</p>
+<p align="center">
+  <b>Qualitative T2I comparison between vanilla conditional generation, GFT, and CFG on Stable Diffusion 1.5 with the prompt "Elegant crystal vase holding pink peonies, soft raindrops tracing paths down the window behind it".</b>
+</p>
+## Key Features
+*   **Highly Efficient**: GFT reduces sampling to a single model inference, effectively halving the computational cost compared to CFG.
+*   **Minimal Modifications**: It requires fewer than 10 lines of code changes to existing visual generative model codebases, inheriting most design choices and hyperparameters.
+*   **Universal Applicability**: GFT is highly versatile, working across diverse visual generative models such as diffusion, Flow, autoregressive, and masked-prediction architectures.
+*   **Training from Scratch**: Unlike distillation methods, GFT enables direct training of guidance-free models from scratch.
+*   **Performance Match**: Consistently achieves comparable or even lower FID scores with similar diversity-fidelity trade-offs compared to CFG baselines.
+*   **Flexible Sampling**: Allows adjustment of sampling temperature with only a single model.
+<p align="center">
+  <img src="https://github.com/thu-ml/GFT/raw/main/temperature.png" alt="temperature control" style="width:80%;">
+</p>
+<p align="center">
+  <b>GFT allows us to adjust sampling temperature of visual generation, with only a single model.</b>
+</p>
+## Usage and Pretrained Checkpoints
+The project provides training code and pretrained guidance-free checkpoints for various models. These include:
+*   DiT models
+*   Stable Diffusion 1.5 models (e.g., [SD1.5-GF-finetune](https://huggingface.co/aaa-ceku7/GFT/tree/main/SD1.5-GF-finetune))
+*   LlamaGen models
+For detailed implementation, training instructions, and example usage, please refer to the respective directories within the [GitHub repository](https://github.com/thu-ml/GFT).
+## Citation
+If you find our project helpful, please consider citing:
+```bibtex
+@article{chen2025visual,
+  title={Visual Generation Without Guidance},
+  author={Chen, Huayu and Jiang, Kai and Zheng, Kaiwen and Chen, Jianfei and Su, Hang and Zhu, Jun},
+  journal={arXiv preprint arXiv:2501.15420},
+  year={2025}
+}
+```