Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 4 days ago • 213
Sparse Mixture-of-Experts are Domain Generalizable Learners Paper • 2206.04046 • Published Jun 8, 2022 • 1
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Paper • 2403.20331 • Published Mar 29, 2024 • 16
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 35
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey Paper • 2407.21794 • Published Jul 31, 2024 • 6
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Paper • 2506.13654 • Published Jun 16, 2025 • 43
VideoLucy: Deep Memory Backtracking for Long Video Understanding Paper • 2510.12422 • Published Oct 14, 2025 • 1
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published 8 days ago • 27
Towards Language-Driven Video Inpainting via Multimodal Large Language Models Paper • 2401.10226 • Published Jan 18, 2024 • 2
OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection Paper • 2306.09301 • Published Jun 15, 2023 • 1
Large Language Models are Visual Reasoning Coordinators Paper • 2310.15166 • Published Oct 23, 2023 • 2
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning Paper • 2603.26653 • Published 13 days ago • 18
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published 8 days ago • 27