ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published Dec 4, 2025 • 47
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Paper • 2506.09040 • Published Jun 10, 2025 • 34
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper • 2406.14544 • Published Jun 20, 2024 • 35