Glyph-ByT5 / README.md

Upload folder using huggingface_hub

cd05235 verified 27 days ago

4.1 kB

	---
	language:
	- en
	library_name: glyph-byt5
	---

	# Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

	We introduce Glyph-ByT5-v2, a customized text encoder for accurate multilingual visual text rendering and improved aesthetics.
	As an extension of Glyph-SDXL, our multilingual version supports visual text rendering for up to 10 different languages: English, Chinese, Japanese, Korean, French, German, Spanish, Italian, Portuguese and Russian.
	Combined with SDXL, our proposed Glyph-SDXL-v2 achieves accurate multilingual design image visual text rendering.


	> [Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering](https://glyph-byt5-v2.github.io/)
	> [Zeyu Liu](https://github.com/lzy-tony), [Weicong Liang](https://scholar.google.com/citations?user=QvHDIygAAAAJ&hl=zh-CN), [Yiming Zhao](https://scholar.google.com.hk/citations?user=_knPaYsAAAAJ&hl=zh-CN), [Bohan Chen](https://github.com/BHCHENGIT), [Ji Li](https://sites.google.com/a/usc.edu/jili/), [Yuhui Yuan](https://www.microsoft.com/en-us/research/people/yuyua/)
	> Microsoft Research Asia; Tsinghua University; Peking University; University of Liverpool
	> Preprint

	## Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: [https://github.com/AIGText/Glyph-ByT5]
	- Paper: [https://arxiv.org/abs/2406.10208]
	- Project Page: [https://glyph-byt5-v2.github.io/]


	## Model Description

	Please check our [paper](https://arxiv.org/abs/2406.10208) and [project page](https://glyph-byt5-v2.github.io/) for more details. Detail usage and inference code can be found [here](https://github.com/AIGText/Glyph-ByT5).

	## Visualization

	<table>
	<tr>
	<td><img src="assets/teaser/teaser_multilingual_1.webp" alt="example 1" width="200"/></td>
	<td><img src="assets/teaser/teaser_multilingual_2.webp" alt="example 2" width="200"/></td>
	<td><img src="assets/teaser/teaser_multilingual_3.webp" alt="example 3" width="200"/></td>
	<td><img src="assets/teaser/teaser_multilingual_4.webp" alt="example 4" width="200"/></td>
	</tr>
	</table>

	## Quick Usage

	```
	python inference_v2.py configs/glyph_sdxl_v2_albedo.py checkpoints examples/xiaoman.json --out_folder work_dirs/xiaoman --device cuda --sampler dpm
	```

	## More Configurations

	We list some more useful configurations for easy usage:

	\| Argument/Config \| Place \| Default \| Description \|
	\| ----------------------------- \| ---------- \| ----------------------------------- \| ------------------------------------------------------------ \|
	\| cfg \| argument \| 5.0 \| Classifier-free guidance \|
	\| sampler \| argument \| dpm \| Sampler, provide support for dpm (DPM++ 2M Karras) and euler (EulerDiscreteScheduler) \|
	\| pretrained_model_name_or_path \| config \| stablediffusionapi/albedobase-xl-20 \| Base model \|
	\| seed \| annotation \| None \| Seed for inference \|


	## Citation

	If you find our work useful in your research, please consider citing:

	```
	@misc{liu2024glyphbyt5v2,
	title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering},
	author={Zeyu Liu and Weicong Liang and Yiming Zhao and Bohan Chen and Ji Li and Yuhui Yuan},
	year={2024},
	eprint={2406.10208},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```

	and

	```
	@misc{liu2024glyphbyt5,
	title={Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering},
	author={Zeyu Liu and Weicong Liang and Zhanhao Liang and Chong Luo and Ji Li and Gao Huang and Yuhui Yuan},
	year={2024},
	eprint={2403.09622},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```