Future Optical Flow Prediction Improves Robot Control & Video Generation Paper • 2601.10781 • Published 13 days ago • 19
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding Paper • 2512.05774 • Published Dec 5, 2025 • 7
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Paper • 2507.12806 • Published Jul 17, 2025 • 21
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14, 2025 • 99
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6, 2025 • 96
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published Jan 2, 2025 • 20
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6, 2024 • 37