See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding
Paper
•
2406.11665
•
Published
•
1
The bilingual English/Chinese Baichuan2-7B-Chat VLM trained via LORA for https://arxiv.org/abs/2406.11665.
The Chinese half of the training data used for multimodal alignment and visual instruction tuning is sampled from here.