Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
wang's picture
1 7 1

wang

stormthunder
·

AI & ML interests

None yet

Organizations

foundation-multimodal-models's profile picture

authored a paper 3 months ago

SAIL-VL2 Technical Report

Paper • 2509.14033 • Published Sep 17 • 44
authored 9 papers 6 months ago

SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement

Paper • 2507.01643 • Published Jul 2 • 2

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Paper • 2507.07999 • Published Jul 10 • 49

Benchmarking and Improving Detail Image Caption

Paper • 2405.19092 • Published May 29, 2024

Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

Paper • 2405.17871 • Published May 28, 2024 • 1

World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering

Paper • 2409.20424 • Published Sep 30, 2024

UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?

Paper • 2503.09949 • Published Mar 13 • 5

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Paper • 2504.10462 • Published Apr 14 • 15

Unveiling the Tapestry of Consistency in Large Vision-Language Models

Paper • 2405.14156 • Published May 23, 2024

VGR: Visual Grounded Reasoning

Paper • 2506.11991 • Published Jun 13 • 19
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs