Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.10521

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Paper • 2506.10521 • Published Jun 12, 2025 • 73
InternScience/SFE

Viewer • Updated 17 days ago • 1.66k • 3.8k • 16
InternScience/SFE-train

Preview • Updated Oct 10, 2025 • 340

General Science

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Paper • 2506.10521 • Published Jun 12, 2025 • 73
Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System

Paper • 2505.20310 • Published May 22, 2025 • 1
InternScience/SFE

Viewer • Updated 17 days ago • 1.66k • 3.8k • 16
PrismaX/Manalyzer

Viewer • Updated Jun 25, 2025 • 6.66k • 12 • 2

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1, 2025 • 109
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 106

Multimodal Benchmarks

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 47
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 15
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 62

💡HF Papers Live 3: AI for Science

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Paper • 2505.19897 • Published May 26, 2025 • 104
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Paper • 2506.10521 • Published Jun 12, 2025 • 73
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Paper • 2506.10974 • Published Jun 12, 2025 • 19

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

Paper • 2505.10557 • Published May 15, 2025 • 47
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Paper • 2505.16400 • Published May 22, 2025 • 35
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?

Paper • 2505.15929 • Published May 21, 2025 • 49
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Paper • 2506.05349 • Published Jun 5, 2025 • 24

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Multimodal Benchmarks

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Paper • 2407.07053 • Published Jul 9, 2024 • 47
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16, 2024 • 15
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 62

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Paper • 2506.10521 • Published Jun 12, 2025 • 73
InternScience/SFE

Viewer • Updated 17 days ago • 1.66k • 3.8k • 16
InternScience/SFE-train

Preview • Updated Oct 10, 2025 • 340

💡HF Papers Live 3: AI for Science

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Paper • 2505.19897 • Published May 26, 2025 • 104
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Paper • 2506.10521 • Published Jun 12, 2025 • 73
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Paper • 2506.10974 • Published Jun 12, 2025 • 19

General Science

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Paper • 2506.10521 • Published Jun 12, 2025 • 73
Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System

Paper • 2505.20310 • Published May 22, 2025 • 1
InternScience/SFE

Viewer • Updated 17 days ago • 1.66k • 3.8k • 16
PrismaX/Manalyzer

Viewer • Updated Jun 25, 2025 • 6.66k • 12 • 2

MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

Paper • 2505.10557 • Published May 15, 2025 • 47
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Paper • 2505.16400 • Published May 22, 2025 • 35
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?

Paper • 2505.15929 • Published May 21, 2025 • 49
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Paper • 2506.05349 • Published Jun 5, 2025 • 24

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1, 2025 • 109
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 106

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs