Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published Dec 9, 2025 • 132
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 178
DreamOmni2: Multimodal Instruction-based Editing and Generation Paper • 2510.06679 • Published Oct 8, 2025 • 73
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech Paper • 2509.25131 • Published Sep 29, 2025 • 16
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26, 2025 • 185
MGM-Omni Collection MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech • 18 items • Updated Oct 11, 2025 • 11
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17, 2025 • 79
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 48
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 117
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 704
MGM-Data Collection Official data collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 2 items • Updated Apr 21, 2024 • 7
MGM Collection Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 13 items • Updated May 3, 2024 • 47
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 47