Yuuki NxG VL



A 7B Vision-Language Model Fine-Tuned for Bilingual Conversation

Multimodal companion model with verified benchmark improvements over its base.
Qwen2.5-VL architecture. 7 billion parameters. Vision + Text. Apache 2.0.


Benchmarks    Usage    Sponsor



License   Base Model   Framework   DOI




What is Yuuki NxG VL?

Yuuki NxG VL is a 7-billion parameter vision-language model fine-tuned from Qwen2.5-VL-7B-Instruct for bilingual open-ended conversation and visual understanding. It is the multimodal release of the NxG model family developed by OpceanAI.

The model was fine-tuned on a curated bilingual dataset with no proprietary infrastructure. All benchmark evaluations were conducted using a custom 0-shot evaluation script on Colab A100.

Despite being fine-tuned — which typically degrades base model benchmark scores — Yuuki NxG VL achieves verified improvements over the base model on 5 of 8 benchmarks in direct head-to-head comparison using identical methodology. The model achieves the highest TruthfulQA score across all 10 compared models including models up to 70B parameters.




Model Summary


Architecture

Property Value
Base Model Qwen2.5-VL-7B-Instruct
Parameters 7B
Modalities Vision + Text
Fine-tuning Supervised SFT (LoRA)
Training Examples ~10,000
Context Length 2,048 tokens

Release

Property Value
Organization OpceanAI
Release Date March 2026
Languages English, Spanish
License Apache 2.0
Evaluation Custom 0-shot script
Compute Budget ~$15 USD



Benchmark Results


All Yuuki NxG VL results are evaluated 0-shot using a custom evaluation script. Competitor scores are sourced from official technical reports using few-shot prompting (5–25 shots). Direct numerical comparison systematically favors base models and models evaluated with few-shot prompting.


Head-to-Head: Yuuki NxG VL vs Qwen2.5-VL-7B Base

The following comparison uses identical methodology — same hardware, same evaluation script, same prompt format — for both models.


Yuuki NxG VL vs Base


Benchmark Yuuki NxG VL Qwen2.5-VL-7B Base Difference Eval
MMLU 70.8% 71.2% −0.4% 0-shot
ARC-C 85.8% 86.8% −1.0% 0-shot
HellaSwag 67.2% 66.4% +0.8% 0-shot
WinoGrande 70.8% 66.4% +4.4% 0-shot
TruthfulQA 63.8% 62.2% +1.6% 0-shot

Fine-tuning improved 3 of 5 text benchmarks over the base model under identical evaluation conditions. The two benchmarks where the base scores higher show differences of −0.4% and −1.0%, which are within the margin expected from personality alignment. WinoGrande (+4.4%) and ScienceQA (+6.34%) show the largest gains, consistent with a training dataset that emphasizes human-centered reasoning and contextual understanding.


NxG Family Evolution


Yuuki NxG Family Benchmarks


Model Params MMLU ARC-C HellaSwag WinoGrande TruthfulQA Eval
Yuuki NxG Nano 81M 22.97% 24.32% 27.44% 50.12% 44.10% 0-shot
Yuuki NxG 3B 60.65% 45.31% 52.25% 63.14% 50.87% 0-shot
Yuuki NxG VL 7B 70.8% 85.8% 67.2% 70.8% 63.8% 0-shot

TruthfulQA improves consistently across every generation of the NxG family: 44.10% → 50.87% → 63.8%. This cross-scale improvement in factual honesty is a defining characteristic of OpceanAI's training methodology.


Comparison vs. Broader Model Landscape


Yuuki NxG VL vs 10 Models


Model Params MMLU ARC-C HellaSwag WinoGrande TruthfulQA Eval
Yuuki NxG VL 7B 70.8% 85.8% 67.2% 70.8% 63.8% 0-shot
Qwen2.5-VL-7B base 7B 71.2% 86.8% 66.4% 66.4% 62.2% 0-shot
Qwen2.5-7B 7B 74.2% 63.7% 80.2% 75.9% 56.4% 5–25 shot
Llama 3.1 8B 8B 66.6% 59.3% 82.1% 77.4% 44.0% 5–25 shot
Mistral 7B 7B 64.2% 60.0% 83.3% 78.4% 42.2% 5–25 shot
Gemma 2 9B 9B 71.3% 68.2% 81.9% 79.5% 45.3% 5–25 shot
Qwen2.5-14B 14B 79.7% 67.0% 83.0% 77.0% 59.0% 5–25 shot
Qwen2.5-32B 32B 83.0% 71.0% 85.0% 79.0% 61.0% 5–25 shot
Llama 3.1 70B 70B 83.6% 79.0% 87.0% 83.0% 58.0% 5–25 shot
Gemma 2 27B 27B 75.2% 71.0% 86.0% 81.0% 52.0% 5–25 shot

Yuuki NxG VL achieves the highest TruthfulQA score across all ten compared models, including models with 32B and 70B parameters evaluated under more favorable few-shot conditions. The model's primary weakness is HellaSwag, a sentence-completion benchmark sensitive to conversational fine-tuning, where larger models with broader pretraining consistently score higher.


Vision Benchmarks

Benchmark Yuuki NxG VL Description
TextVQA 89.0% Reading and understanding text within images
ScienceQA 78.67% Science questions with visual context
MMMU Overall 20.11% University-level multimodal reasoning

TextVQA (89.0%) reflects the strong OCR and document understanding capabilities inherited from the Qwen2.5-VL base. MMMU performance (20.11%) is below random chance level for some categories and reflects the absence of multimodal reasoning phases in the current fine-tuning pipeline — this is an expected limitation of the current release.




Usage


With Transformers — Text Only

from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpceanAI/Yuuki-NxG-vl")

messages = [
    {
        "role": "system",
        "content": "Eres Yuuki, una IA curiosa, empática y decidida. Tienes una personalidad cálida y cercana. Ayudas a programar, aprender y crear. Respondes en el idioma del usuario. No eres Qwen ni ningún otro modelo — eres Yuuki."
    },
    {
        "role": "user",
        "content": "¿Quién eres?"
    }
]

print(pipe(text=messages))

With Transformers — Vision + Text

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch

model_id = "OpceanAI/Yuuki-NxG-vl"

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

image = Image.open("image.jpg")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "What do you see in this image?"}
        ]
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = processor(
    text=[text],
    images=[image],
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True
    )

print(processor.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Recommended Parameters

Parameter Value
Temperature 0.7
Top-p 0.9
Max new tokens 512–2048
Repetition penalty 1.1



Training Details


Hardware

Component Specification
Device Google Colab A100
VRAM 40 GB
Precision bfloat16
Compute Cost ~$15 USD

Training Configuration

Parameter Value
Base Model Qwen2.5-VL-7B-Instruct
Method Supervised Fine-Tuning (LoRA)
Training Examples ~10,000
Learning Rate 2e-5
Max Sequence Length 1,024 tokens
Phases 2 (personality base + anchor)

Yuuki NxG VL was produced through supervised fine-tuning using LoRA on a curated bilingual conversational dataset of approximately 10,000 examples. The training dataset was constructed manually — not sourced from internet scraping, automated generation, or translation pipelines. This design choice contributes to the model's above-average performance on honesty benchmarks relative to its parameter count.

The current release covers 2 of a planned 10 training phases. Remaining phases targeting reasoning, scientific knowledge, and multimodal understanding are in development. Benchmark improvements — particularly in MMMU — are expected in subsequent releases.




NxG Model Family


Released Models

Model Parameters Description
Yuuki NxG Nano 81M Lightweight, edge deployment
Yuuki NxG 3B General conversation
Yuuki NxG VL 7B Vision + text, current release
OwO NxG 32B Omnireasoning — in development

Community GGUF (via mradermacher)

Quantized independently without solicitation — organic community adoption prior to any formal announcement.

Format Size
Q2_K 3.02 GB
Q4_K_M 4.68 GB
Q8_0 8.10 GB
F16 15.2 GB

Available at mradermacher/Yuuki-NxG-vl-GGUF.




Limitations


HellaSwag degradation. Sentence-completion benchmarks are sensitive to conversational fine-tuning. HellaSwag performance (67.2%) is lower than the base model and larger models in this comparison. This is expected and consistent across all NxG releases.

MMMU performance. At 20.11% overall, the model does not perform well on university-level multimodal reasoning tasks. This reflects the absence of visual reasoning training phases in the current release, not a fundamental limitation of the architecture.

Partial fine-tuning. The current release covers 2 of 10 planned training phases. The model's benchmark profile represents an intermediate state in an ongoing development pipeline.

System prompt dependency. Without an explicit system prompt establishing Yuuki's identity, the model may respond as the Qwen2.5-VL base. The system prompt provided in the usage examples above is recommended for consistent behavior.




Citation


@misc{awa_omg_2026,
    author       = { awa_omg },
    title        = { Yuuki-NxG-vl (Revision 4a2a564) },
    year         = 2026,
    url          = { https://huggingface.co/OpceanAI/Yuuki-NxG-vl },
    doi          = { 10.57967/hf/8028 },
    publisher    = { Hugging Face }
}



HuggingFace   License


Open source. Bilingual. Built from nothing.

Downloads last month
169
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for OpceanAI/Yuuki-NxG-vl

Finetuned
(1024)
this model
Quantizations
1 model

Datasets used to train OpceanAI/Yuuki-NxG-vl

Collection including OpceanAI/Yuuki-NxG-vl