My favourites - a Warvito Collection

Warvito 's Collections

My favourites

updated 2 days ago

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2, 2025 • 107
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 151
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 8
Improving the Diffusability of Autoencoders

Paper • 2502.14831 • Published Feb 20, 2025 • 2
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Paper • 2410.10733 • Published Oct 14, 2024 • 8
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space

Paper • 2508.00413 • Published Aug 1, 2025 • 5
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published Apr 14, 2025 • 21
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

Paper • 2508.14444 • Published Aug 20, 2025 • 39
MetaCLIP 2: A Worldwide Scaling Recipe

Paper • 2507.22062 • Published Jul 29, 2025 • 36
Waver: Wave Your Way to Lifelike Video Generation

Paper • 2508.15761 • Published Aug 21, 2025 • 36
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 267
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26, 2025 • 36
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis

Paper • 2509.10441 • Published Sep 12, 2025 • 30
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4, 2025 • 195
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 128
Step1X-Edit: A Practical Framework for General Image Editing

Paper • 2504.17761 • Published Apr 24, 2025 • 92
Transition Matching: Scalable and Flexible Generative Modeling

Paper • 2506.23589 • Published Jun 30, 2025 • 1
MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21, 2025 • 97
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

Paper • 2506.23918 • Published Jun 30, 2025 • 89
Diffusion Beats Autoregressive in Data-Constrained Settings

Paper • 2507.15857 • Published Jul 21, 2025 • 1
Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 46
UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward

Paper • 2509.06818 • Published Sep 8, 2025 • 29
Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

Paper • 2509.14055 • Published Sep 17, 2025 • 17
Inpainting-Guided Policy Optimization for Diffusion Large Language Models

Paper • 2509.10396 • Published Sep 12, 2025 • 15
Lynx: Towards High-Fidelity Personalized Video Generation

Paper • 2509.15496 • Published Sep 19, 2025 • 12
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

Paper • 2509.19296 • Published Sep 23, 2025 • 23
Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24, 2025 • 99
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

Paper • 2509.19284 • Published Sep 23, 2025 • 22
Seedream 4.0: Toward Next-generation Multimodal Image Generation

Paper • 2509.20427 • Published Sep 24, 2025 • 82
Stochastic activations

Paper • 2509.22358 • Published Sep 26, 2025 • 2
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Paper • 2509.24900 • Published Sep 29, 2025 • 53
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 165
WithAnyone: Towards Controllable and ID Consistent Image Generation

Paper • 2510.14975 • Published Oct 16, 2025 • 84
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 74
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

Paper • 2510.20766 • Published Oct 23, 2025 • 34
Continuous Autoregressive Language Models

Paper • 2510.27688 • Published Oct 31, 2025 • 70
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published Nov 12, 2025 • 69
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Paper • 2512.04677 • Published Dec 4, 2025 • 167
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published Nov 27, 2025 • 224
Vision Bridge Transformer at Scale

Paper • 2511.23199 • Published Nov 28, 2025 • 45
One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation

Paper • 2512.07829 • Published 29 days ago • 21
Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published 22 days ago • 99
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Paper • 2512.12967 • Published 23 days ago • 103
What matters for Representation Alignment: Global Information or Spatial Structure?

Paper • 2512.10794 • Published 26 days ago • 8
KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published 22 days ago • 42