Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.23451

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30, 2025 • 111
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

Paper • 2510.20479 • Published Oct 23, 2025 • 12
A Definition of AGI

Paper • 2510.18212 • Published Oct 21, 2025 • 36
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published Oct 23, 2025 • 50

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18, 2025 • 49
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

Paper • 2510.16888 • Published Oct 19, 2025 • 22
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24, 2025 • 31

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Paper • 2510.18019 • Published Oct 20, 2025 • 18
PORTool: Tool-Use LLM Training with Rewarded Tree

Paper • 2510.26020 • Published Oct 29, 2025 • 5
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28, 2025 • 4
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 41

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30, 2025 • 111
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

Paper • 2510.20479 • Published Oct 23, 2025 • 12
A Definition of AGI

Paper • 2510.18212 • Published Oct 21, 2025 • 36
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published Oct 23, 2025 • 50

Is Multilingual LLM Watermarking Truly Multilingual? A Simple Back-Translation Solution

Paper • 2510.18019 • Published Oct 20, 2025 • 18
PORTool: Tool-Use LLM Training with Rewarded Tree

Paper • 2510.26020 • Published Oct 29, 2025 • 5
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model

Paper • 2510.24992 • Published Oct 28, 2025 • 4
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Paper • 2510.24821 • Published Oct 28, 2025 • 41

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18, 2025 • 49
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

Paper • 2510.16888 • Published Oct 19, 2025 • 22
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24, 2025 • 31

Multimodal Reasoning

InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning

Paper • 2502.11573 • Published Feb 17, 2025 • 9
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published Feb 4, 2025 • 23
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

Paper • 2502.11775 • Published Feb 17, 2025 • 9
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published Dec 24, 2024 • 39

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs