JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published 18 days ago • 18
Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks Paper • 2510.25760 • Published Oct 29, 2025 • 16
Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods Paper • 2510.07143 • Published Oct 8, 2025 • 12
PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era Paper • 2509.12989 • Published Sep 16, 2025 • 28