My favourites
updated
Test-Time Scaling with Reflective Generative Model
Paper
•
2507.01951
•
Published
•
107
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
•
2502.05171
•
Published
•
151
Autoregressive Diffusion Models
Paper
•
2110.02037
•
Published
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative
Image Modeling
Paper
•
2502.09509
•
Published
•
8
Improving the Diffusability of Autoencoders
Paper
•
2502.14831
•
Published
•
2
Deep Compression Autoencoder for Efficient High-Resolution Diffusion
Models
Paper
•
2410.10733
•
Published
•
8
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured
Latent Space
Paper
•
2508.00413
•
Published
•
5
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion
Transformers
Paper
•
2504.10483
•
Published
•
21
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid
Mamba-Transformer Reasoning Model
Paper
•
2508.14444
•
Published
•
39
MetaCLIP 2: A Worldwide Scaling Recipe
Paper
•
2507.22062
•
Published
•
36
Waver: Wave Your Way to Lifelike Video Generation
Paper
•
2508.15761
•
Published
•
36
Qwen-Image Technical Report
Paper
•
2508.02324
•
Published
•
267
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior
Long-Context Learning
Paper
•
2508.18756
•
Published
•
36
InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Paper
•
2509.10441
•
Published
•
30
Why Language Models Hallucinate
Paper
•
2509.04664
•
Published
•
195
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal
Conditioning
Paper
•
2509.08519
•
Published
•
128
Step1X-Edit: A Practical Framework for General Image Editing
Paper
•
2504.17761
•
Published
•
92
Transition Matching: Scalable and Flexible Generative Modeling
Paper
•
2506.23589
•
Published
•
1
MMaDA: Multimodal Large Diffusion Language Models
Paper
•
2505.15809
•
Published
•
97
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and
Future Frontiers
Paper
•
2506.23918
•
Published
•
89
Diffusion Beats Autoregressive in Data-Constrained Settings
Paper
•
2507.15857
•
Published
•
1
Hierarchical Reasoning Model
Paper
•
2506.21734
•
Published
•
46
UMO: Scaling Multi-Identity Consistency for Image Customization via
Matching Reward
Paper
•
2509.06818
•
Published
•
29
Wan-Animate: Unified Character Animation and Replacement with Holistic
Replication
Paper
•
2509.14055
•
Published
•
17
Inpainting-Guided Policy Optimization for Diffusion Large Language
Models
Paper
•
2509.10396
•
Published
•
15
Lynx: Towards High-Fidelity Personalized Video Generation
Paper
•
2509.15496
•
Published
•
12
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model
Self-Distillation
Paper
•
2509.19296
•
Published
•
23
Video models are zero-shot learners and reasoners
Paper
•
2509.20328
•
Published
•
99
What Characterizes Effective Reasoning? Revisiting Length, Review, and
Structure of CoT
Paper
•
2509.19284
•
Published
•
22
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Paper
•
2509.20427
•
Published
•
82
Paper
•
2509.22358
•
Published
•
2
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation
and Editing
Paper
•
2509.24900
•
Published
•
53
Diffusion Transformers with Representation Autoencoders
Paper
•
2510.11690
•
Published
•
165
WithAnyone: Towards Controllable and ID Consistent Image Generation
Paper
•
2510.14975
•
Published
•
84
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
•
2404.02905
•
Published
•
74
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion
Paper
•
2510.20766
•
Published
•
34
Continuous Autoregressive Language Models
Paper
•
2510.27688
•
Published
•
70
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Paper
•
2511.09611
•
Published
•
69
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
Paper
•
2512.04677
•
Published
•
167
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper
•
2511.22699
•
Published
•
224
Vision Bridge Transformer at Scale
Paper
•
2511.23199
•
Published
•
45
One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation
Paper
•
2512.07829
•
Published
•
21
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper
•
2512.13687
•
Published
•
99
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
Paper
•
2512.12967
•
Published
•
103
What matters for Representation Alignment: Global Information or Spatial Structure?
Paper
•
2512.10794
•
Published
•
8
KlingAvatar 2.0 Technical Report
Paper
•
2512.13313
•
Published
•
42