[NeurIPS 2025] OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions
Model Description
These three models support multi-modal control video customization tasks, including reference-to-video, reference-mask-to-video, reference-depth-to-video, and reference-instruction-to-video generation. Our models are based on Wan2.1-1.3B, Wan2.1-14B, Wan2.2-14B, and VACE. Here are some comparisons with the state-of-the-art method VACE on video customization:
· (a) 2.1-1.3B model
| (a1) a woman rolling up a fitted sheet | |
|
|
| Reference Image | Depth Video |
|
|
| VACE-2.1-1.3B | OmniVCus-2.1-1.3B (Ours) |
| (a2) a church in the winter | |
|
|
| Reference Image | Mask Video |
|
|
| VACE-2.1-1.3B | OmniVCus-2.1-1.3B |
· (b) 2.1-14B model
| (b1) a man holding a piece of paper in his hands | |
|
|
| Reference Image | Depth Video |
|
|
| VACE-2.1-14B | OmniVCus-2.1-14B (Ours) |
| (b2) a boy in a medical gown and hairnet in a hospital room | |
|
|
| Reference Image | Mask Video |
|
|
| VACE-2.1-14B | OmniVCus-2.1-14B (Ours) |
· (c) 2.2-14B model
| (c1) a boy looking into an open refrigerator, with tomatoes and a bottle of water on the floor | |
|
|
| Reference Image | Depth Video |
|
|
| VACE-2.2-14B | OmniVCus-2.2-14B (Ours) |
| (c2) a woman standing in a room | |
|
|
| Reference Image | Mask Video |
|
|
| VACE-2.2-14B | OmniVCus-2.2-14B (Ours) |
Github Code Link
Please refer to our GitHub repo for more detailed instructions on using our code and models.
https://github.com/caiyuanhao1998/Open-OmniVCus
Training Data Link
Our models are trained on our curated dataset:
https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Train
Testing Data Link
We provide 648 data samples to test our models
https://huggingface.co/datasets/CaiYuanhao/OmniVCus-Test
Project Page Link
For more video customization results, please refer to our project page:
https://caiyuanhao1998.github.io/project/OmniVCus/
Arxiv Paper Link
For more technical details, please refer to our NeurIPS 2025 paper:
https://arxiv.org/abs/2506.23361
Citation
If you find our code, data, and models useful, please consider citing our paper:
@inproceedings{omnivcus,
title={OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions},
author={Yuanhao Cai and He Zhang and Xi Chen and Jinbo Xing and Kai Zhang and Yiwei Hu and Yuqian Zhou and Zhifei Zhang and Soo Ye Kim and Tianyu Wang and Yulun Zhang and Xiaokang Yang and Zhe Lin and Alan Yuille},
booktitle={NeurIPS},
year={2025}
}
Model tree for CaiYuanhao/OmniVCus
Base model
Wan-AI/Wan2.1-T2V-1.3B