A 7B Vision-Language Model Fine-Tuned for Bilingual Conversation
Multimodal companion model with verified benchmark improvements over its base.
Qwen2.5-VL architecture. 7 billion parameters. Vision + Text. Apache 2.0.
What is Yuuki NxG VL?
Yuuki NxG VL is a 7-billion parameter vision-language model fine-tuned from Qwen2.5-VL-7B-Instruct for bilingual open-ended conversation and visual understanding. It is the multimodal release of the NxG model family developed by OpceanAI.
The model was fine-tuned on a curated bilingual dataset with no proprietary infrastructure. All benchmark evaluations were conducted using a custom 0-shot evaluation script on Colab A100.
Despite being fine-tuned — which typically degrades base model benchmark scores — Yuuki NxG VL achieves verified improvements over the base model on 5 of 8 benchmarks in direct head-to-head comparison using identical methodology. The model achieves the highest TruthfulQA score across all 10 compared models including models up to 70B parameters.
|
Architecture
|
Release
|
All Yuuki NxG VL results are evaluated 0-shot using a custom evaluation script. Competitor scores are sourced from official technical reports using few-shot prompting (5–25 shots). Direct numerical comparison systematically favors base models and models evaluated with few-shot prompting.
Head-to-Head: Yuuki NxG VL vs Qwen2.5-VL-7B Base
The following comparison uses identical methodology — same hardware, same evaluation script, same prompt format — for both models.
| Benchmark | Yuuki NxG VL | Qwen2.5-VL-7B Base | Difference | Eval |
|---|---|---|---|---|
| MMLU | 70.8% | 71.2% | −0.4% | 0-shot |
| ARC-C | 85.8% | 86.8% | −1.0% | 0-shot |
| HellaSwag | 67.2% | 66.4% | +0.8% | 0-shot |
| WinoGrande | 70.8% | 66.4% | +4.4% | 0-shot |
| TruthfulQA | 63.8% | 62.2% | +1.6% | 0-shot |
Fine-tuning improved 3 of 5 text benchmarks over the base model under identical evaluation conditions. The two benchmarks where the base scores higher show differences of −0.4% and −1.0%, which are within the margin expected from personality alignment. WinoGrande (+4.4%) and ScienceQA (+6.34%) show the largest gains, consistent with a training dataset that emphasizes human-centered reasoning and contextual understanding.
NxG Family Evolution
| Model | Params | MMLU | ARC-C | HellaSwag | WinoGrande | TruthfulQA | Eval |
|---|---|---|---|---|---|---|---|
| Yuuki NxG Nano | 81M | 22.97% | 24.32% | 27.44% | 50.12% | 44.10% | 0-shot |
| Yuuki NxG | 3B | 60.65% | 45.31% | 52.25% | 63.14% | 50.87% | 0-shot |
| Yuuki NxG VL | 7B | 70.8% | 85.8% | 67.2% | 70.8% | 63.8% | 0-shot |
TruthfulQA improves consistently across every generation of the NxG family: 44.10% → 50.87% → 63.8%. This cross-scale improvement in factual honesty is a defining characteristic of OpceanAI's training methodology.
Comparison vs. Broader Model Landscape
| Model | Params | MMLU | ARC-C | HellaSwag | WinoGrande | TruthfulQA | Eval |
|---|---|---|---|---|---|---|---|
| Yuuki NxG VL | 7B | 70.8% | 85.8% | 67.2% | 70.8% | 63.8% | 0-shot |
| Qwen2.5-VL-7B base | 7B | 71.2% | 86.8% | 66.4% | 66.4% | 62.2% | 0-shot |
| Qwen2.5-7B | 7B | 74.2% | 63.7% | 80.2% | 75.9% | 56.4% | 5–25 shot |
| Llama 3.1 8B | 8B | 66.6% | 59.3% | 82.1% | 77.4% | 44.0% | 5–25 shot |
| Mistral 7B | 7B | 64.2% | 60.0% | 83.3% | 78.4% | 42.2% | 5–25 shot |
| Gemma 2 9B | 9B | 71.3% | 68.2% | 81.9% | 79.5% | 45.3% | 5–25 shot |
| Qwen2.5-14B | 14B | 79.7% | 67.0% | 83.0% | 77.0% | 59.0% | 5–25 shot |
| Qwen2.5-32B | 32B | 83.0% | 71.0% | 85.0% | 79.0% | 61.0% | 5–25 shot |
| Llama 3.1 70B | 70B | 83.6% | 79.0% | 87.0% | 83.0% | 58.0% | 5–25 shot |
| Gemma 2 27B | 27B | 75.2% | 71.0% | 86.0% | 81.0% | 52.0% | 5–25 shot |
Yuuki NxG VL achieves the highest TruthfulQA score across all ten compared models, including models with 32B and 70B parameters evaluated under more favorable few-shot conditions. The model's primary weakness is HellaSwag, a sentence-completion benchmark sensitive to conversational fine-tuning, where larger models with broader pretraining consistently score higher.
Vision Benchmarks
| Benchmark | Yuuki NxG VL | Description |
|---|---|---|
| TextVQA | 89.0% | Reading and understanding text within images |
| ScienceQA | 78.67% | Science questions with visual context |
| MMMU Overall | 20.11% | University-level multimodal reasoning |
TextVQA (89.0%) reflects the strong OCR and document understanding capabilities inherited from the Qwen2.5-VL base. MMMU performance (20.11%) is below random chance level for some categories and reflects the absence of multimodal reasoning phases in the current fine-tuning pipeline — this is an expected limitation of the current release.
With Transformers — Text Only
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="OpceanAI/Yuuki-NxG-vl")
messages = [
{
"role": "system",
"content": "Eres Yuuki, una IA curiosa, empática y decidida. Tienes una personalidad cálida y cercana. Ayudas a programar, aprender y crear. Respondes en el idioma del usuario. No eres Qwen ni ningún otro modelo — eres Yuuki."
},
{
"role": "user",
"content": "¿Quién eres?"
}
]
print(pipe(text=messages))
With Transformers — Vision + Text
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch
model_id = "OpceanAI/Yuuki-NxG-vl"
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)
image = Image.open("image.jpg")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "What do you see in this image?"}
]
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = processor(
text=[text],
images=[image],
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True
)
print(processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Recommended Parameters
| Parameter | Value |
|---|---|
| Temperature | 0.7 |
| Top-p | 0.9 |
| Max new tokens | 512–2048 |
| Repetition penalty | 1.1 |
|
Hardware
|
Training Configuration
|
Yuuki NxG VL was produced through supervised fine-tuning using LoRA on a curated bilingual conversational dataset of approximately 10,000 examples. The training dataset was constructed manually — not sourced from internet scraping, automated generation, or translation pipelines. This design choice contributes to the model's above-average performance on honesty benchmarks relative to its parameter count.
The current release covers 2 of a planned 10 training phases. Remaining phases targeting reasoning, scientific knowledge, and multimodal understanding are in development. Benchmark improvements — particularly in MMMU — are expected in subsequent releases.
|
Released Models
|
Community GGUF (via mradermacher) Quantized independently without solicitation — organic community adoption prior to any formal announcement.
Available at mradermacher/Yuuki-NxG-vl-GGUF. |
HellaSwag degradation. Sentence-completion benchmarks are sensitive to conversational fine-tuning. HellaSwag performance (67.2%) is lower than the base model and larger models in this comparison. This is expected and consistent across all NxG releases.
MMMU performance. At 20.11% overall, the model does not perform well on university-level multimodal reasoning tasks. This reflects the absence of visual reasoning training phases in the current release, not a fundamental limitation of the architecture.
Partial fine-tuning. The current release covers 2 of 10 planned training phases. The model's benchmark profile represents an intermediate state in an ongoing development pipeline.
System prompt dependency. Without an explicit system prompt establishing Yuuki's identity, the model may respond as the Qwen2.5-VL base. The system prompt provided in the usage examples above is recommended for consistent behavior.
@misc{awa_omg_2026,
author = { awa_omg },
title = { Yuuki-NxG-vl (Revision 4a2a564) },
year = 2026,
url = { https://huggingface.co/OpceanAI/Yuuki-NxG-vl },
doi = { 10.57967/hf/8028 },
publisher = { Hugging Face }
}
- Downloads last month
- 169


