-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2506.10521
-
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
Paper • 2506.10521 • Published • 73 -
Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System
Paper • 2505.20310 • Published • 1 -
InternScience/SFE
Viewer • Updated • 1.66k • 3.8k • 16 -
PrismaX/Manalyzer
Viewer • Updated • 6.66k • 12 • 2
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 109 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 106
-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 47 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 35 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 15 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 62
-
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Paper • 2505.19897 • Published • 104 -
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
Paper • 2506.10521 • Published • 73 -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper • 2506.10974 • Published • 19
-
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Paper • 2505.10557 • Published • 47 -
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Paper • 2505.16400 • Published • 35 -
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
Paper • 2505.15929 • Published • 49 -
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
Paper • 2506.05349 • Published • 24
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 47 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 35 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 15 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 62
-
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Paper • 2505.19897 • Published • 104 -
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
Paper • 2506.10521 • Published • 73 -
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Paper • 2506.10974 • Published • 19
-
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
Paper • 2506.10521 • Published • 73 -
Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System
Paper • 2505.20310 • Published • 1 -
InternScience/SFE
Viewer • Updated • 1.66k • 3.8k • 16 -
PrismaX/Manalyzer
Viewer • Updated • 6.66k • 12 • 2
-
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Paper • 2505.10557 • Published • 47 -
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Paper • 2505.16400 • Published • 35 -
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
Paper • 2505.15929 • Published • 49 -
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
Paper • 2506.05349 • Published • 24
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 109 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 106