Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development Paper • 2603.27460 • Published 7 days ago • 62
LongCat-Next: Lexicalizing Modalities as Discrete Tokens Paper • 2603.27538 • Published 7 days ago • 131
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published 11 days ago • 60
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining Paper • 2603.15030 • Published 19 days ago • 21
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models Paper • 2603.17117 • Published 18 days ago • 87
WildActor: Unconstrained Identity-Preserving Video Generation Paper • 2603.00586 • Published Feb 28 • 37
HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images Paper • 2603.02210 • Published Mar 2 • 29
HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images Paper • 2603.02210 • Published Mar 2 • 29
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published Mar 3 • 102
HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images Paper • 2603.02210 • Published Mar 2 • 29
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning Paper • 2602.19895 • Published Feb 23 • 13
DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning Paper • 2602.19895 • Published Feb 23 • 13
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control Paper • 2602.18422 • Published Feb 20 • 30
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 140
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos Paper • 2602.06949 • Published Feb 6 • 36