Instructions to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="hypaai/Hypa-Gemma4-E2B-v1-LoRAs")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("hypaai/Hypa-Gemma4-E2B-v1-LoRAs", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hypaai/Hypa-Gemma4-E2B-v1-LoRAs"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hypaai/Hypa-Gemma4-E2B-v1-LoRAs",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/hypaai/Hypa-Gemma4-E2B-v1-LoRAs

SGLang

How to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "hypaai/Hypa-Gemma4-E2B-v1-LoRAs" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hypaai/Hypa-Gemma4-E2B-v1-LoRAs",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "hypaai/Hypa-Gemma4-E2B-v1-LoRAs" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hypaai/Hypa-Gemma4-E2B-v1-LoRAs",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for hypaai/Hypa-Gemma4-E2B-v1-LoRAs to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for hypaai/Hypa-Gemma4-E2B-v1-LoRAs to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for hypaai/Hypa-Gemma4-E2B-v1-LoRAs to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="hypaai/Hypa-Gemma4-E2B-v1-LoRAs",
    max_seq_length=2048,
)

Docker Model Runner
How to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with Docker Model Runner:
```
docker model run hf.co/hypaai/Hypa-Gemma4-E2B-v1-LoRAs
```

Hypa-Gemma4 E2B

A multilingual, tool-aware fine-tune of Google's Gemma 4 E2B for low-resource and underrepresented languages.

Model Description

Hypa-Gemma4 E2B (hypaai/Hypa-Gemma4-E2B-v1) is a LoRA-merged fine-tune of Google DeepMind's Gemma 4 E2B-it, produced by Hypa Intelligence. It is the first model released in our open research line on adapting modern open foundation models for low-resource and underrepresented languages, with a deliberate focus on retaining the base model's tool-aware and agentic prompting structure.

This release covers seventeen languages: English, French, Spanish, and fourteen languages of Nigeria. Several of the smaller languages in this set (including Annang, Eggon, Idoma, Igala, Nupe, and Urhobo) have not been formally represented in large-scale fine-tuning corpora before, or had no settled ISO-style language tag at the time we needed one.

The model is intended for translation, language detection, dictionary-style explanation, and general multilingual instruction-following. It inherits Gemma 4's native chat template, system / user / model role structure, and dedicated formatting for thinking and tool use.

Property	Value
Base model	`google/gemma-4-E2B-it` (2.3B effective, 5.1B with PLE)
Method	LoRA (r=1024, α=1024) via Unsloth + QLoRA, then merged to 16-bit
Trainable parameters	1.62B / 6.74B (24.04%)
Training data	15.9M examples across 11 concatenated sub-datasets
Compute	1× NVIDIA H200 SXM, 7.41 days
Languages	17
Context window	128K (inherited from base model)
License	Apache 2.0

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "hypaai/Hypa-Gemma4-E2B-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are an expert Igbo translator."},
    {"role": "user", "content": "Translate to Igbo: Good morning, how are you today?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt",
    add_generation_prompt=True,
).to(model.device)

outputs = model.generate(
    inputs,
    max_new_tokens=256,
    temperature=1.0,
    top_p=0.95,
    top_k=64,
    do_sample=True,
)

print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

For thinking mode, pass enable_thinking=True to apply_chat_template. For tool-use payloads, use the standard Gemma 4 messages-with-tools format.

Languages Covered

Code	Language	Code	Language
`en`	English	`ibb`	Ibibio
`ann`	Annang	`idm`	Idoma
`efi`	Efik	`igl`	Igala
`ebi`	Ebira	`ig`	Igbo
`ego`	Eggon	`nup`	Nupe
`es`	Spanish	`pg`	Pidgin
`fr`	French	`tiv`	Tiv
`ha`	Hausa	`urh`	Urhobo
`yo`	Yoruba

Some of the smaller languages in this set required custom or non-standard tags because no widely-adopted machine-readable code existed at the time of training. Where ISO 639-3 codes were available, we used them; where they were not, we documented our internal codes in the data release so downstream users can reproduce splits.

Training Data

Training data comprises 15.9 million examples assembled from eleven Hypa Intelligence sub-datasets, each contributing a different signal:

Synthetic_Dictionary_Text_ONLY, Synthetic_Dictionary_FF_CC_Text_ONLY, JSON_Dictionary_Text_ONLY — three views (prose, cloze, structured JSON) of a curated multilingual dictionary, providing lexical grounding across all target languages.
Fleurs9000_Text_ONLY — ~9,000 text-only examples derived from FLEURS, providing parallel translations across the language set.
CommonVoice_35k_Text_ONLY, CommonVoice_15k_Text_ONLY — transcript-only data drawn from Mozilla CommonVoice, providing real-world spoken-language distribution as text.
cv_15k_translation, cv_15k_detection — CommonVoice transcripts reformulated into translation and language-detection tasks.
cbp_translation, cbp_detection — parallel translation and detection pairs from a community-sourced parallel corpus with broad coverage of the smaller languages.
gpt_oss_synthetic — general-purpose synthetic instruction data generated using open-source LLMs, included specifically to mitigate catastrophic forgetting of the base model's instruction-following ability.

A public 10k subset of the training data is released as hypaai/Hypa-Text-10k. Additional sub-datasets are progressively being released under the hypaai organization.

Prompt Formatting

Every example was formatted using Gemma 4's native chat template, with explicit system, user, and model roles and dedicated control tokens for thinking (<|think|>, <|channel>thought ... <channel|>) and tool use. Loss was computed only on assistant turns via train_on_responses_only with instruction_part="<|turn>user\n" and response_part="<|turn>model\n".

Training Procedure

Hyperparameter	Value
LoRA rank (r)	1024
LoRA alpha (α)	1024
LoRA dropout	0
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Vision / audio modules	frozen
Quantization	4-bit base, bf16 compute
Optimizer	AdamW 8-bit
Learning rate	1e-5
LR schedule	cosine, 500 warmup steps
Weight decay	0.001
Max grad norm	1.0
Per-device batch size	32
Gradient accumulation	2
Effective batch size	64
Sequence length	2048
Epochs	1
Total steps	248,862
Precision	bfloat16
Gradient checkpointing	enabled (Unsloth)
Hardware	1× NVIDIA H200 SXM (Runpod)
Runtime	7.41 days
Compute (approx)	2.11 × 10²⁰ FLOPs
Random seed	3407

Training was performed using Unsloth, which provides day-zero Gemma 4 support and patches the shared-KV-cache interaction with gradient checkpointing that affects naive QLoRA setups in vanilla transformers.

Evaluation and Recommendations

Training metrics

Final training loss: 0.41 (smooth monotonic decay)
Best evaluation loss: 3.128 at step 40,000
Final evaluation loss: 3.577

Honest note on overfitting

Evaluation loss bottomed out at step 40,000 (≈16% of total training) and rose steadily for the remaining ~209,000 steps, ending ~14% above its best value. The training loss continued to decrease throughout. This is a clear overfitting signature: the model memorized the training distribution beyond the point where additional optimization improved generalization.

For downstream use, we recommend the LoRA checkpoint at step 40,000 rather than the final merged 16-bit weights published in this repository. The intermediate checkpoint is available at hypaai/Hypa-Gemma4-E2B-v1-LoRAs. The merged 16-bit weights in this repo represent the end-of-training state and are useful as a reference but are not the strongest checkpoint by evaluation loss.

The full discussion of why this happened and what we will change for v2 is in the public write-up. Headline planned fixes: lower LoRA rank, load_best_model_at_end=True with eval_loss as the selection metric, language-held-out evaluation splits, and a lower step count.

Qualitative observations

Internal qualitative review on translation tasks shows substantial improvements over the base Gemma 4 E2B for every language in the set, with the largest deltas on the smallest languages (Annang, Efik, Ibibio), where the base model was effectively unusable. Quantitative chrF++, BLEU, and BLEURT results across language pairs will follow in a separate evaluation post.

Intended Use

Direct use cases:

Translation between English / French / Spanish and the fourteen covered low-resource languages
Language detection across all seventeen languages
Dictionary-style lexical lookup and explanation
Multilingual instruction-following on dialogue tasks
Tool-aware / function-calling-style prompting (inheriting the base model's structure)

Downstream use:

Suitable as a starting point for further fine-tuning on more specialized tasks within the supported languages
Suitable for adapter stacking (e.g., domain-specific LoRA on top)
Suitable for on-device deployment when quantized (E2B at 4-bit fits on mid-range mobile hardware)

Out-of-Scope and Limitations

Not safety-tuned for sensitive domains. This model has not undergone RLHF or DPO post-training. It should not be used unsupervised for medical, legal, financial, or psychological-counseling applications.
Quality varies by language. The smallest languages in the set are underrepresented even within our training mix and the resulting model output should be reviewed by native speakers before being used in production.
Overfitting on the final checkpoint. As noted above, the merged 16-bit weights in this repository correspond to the end of training, not the best evaluation checkpoint. For applications that prioritize generalization, use the step-40,000 LoRA from the companion repository.
Vision and audio components are frozen. This is a text-only fine-tune. The base model's image and audio capabilities are preserved at the underlying weight level but were not exercised during training and have not been validated for our target languages.
Tokenization quality. Gemma 4's 256K-vocabulary tokenizer fragments low-resource languages less aggressively than smaller-vocabulary tokenizers, but the smallest languages in this release still tokenize at higher cost per character than English. This is the gap we expect future iterations to close.
Coverage is finite. The seventeen languages in this release are the start, not the end. Many other underrepresented languages are not yet supported and may produce unreliable output.

Bias, Risks, and Limitations

This model inherits the biases and limitations of its base model (Google Gemma 4) and adds the biases of its fine-tuning corpus, which is weighted toward dictionary, religious-parallel, and CommonVoice text. Religious-parallel text in particular is a known cause of register and content bias in low-resource translation models. Users deploying this model in customer-facing applications should evaluate output for cultural appropriateness in their specific use case and language.

The model is not intended to make decisions affecting people's rights, health, finances, or wellbeing. Like all language models, it can produce confident-sounding output that is incorrect, particularly on the smallest languages where training data was thinnest.

Released Artifacts

🤗 Merged 16-bit model (this repo): hypaai/Hypa-Gemma4-E2B-v1
🤗 LoRA adapter checkpoints: hypaai/Hypa-Gemma4-E2B-v1-LoRAs
📊 TensorBoard metrics: view on HF
📦 Public training data subset: hypaai/Hypa-Text-10k
💻 GitHub repository: hypaai/Hypa-Gemma
📝 Blog post: Tuning Gemma 4 for multilingual and tool-aware language understanding (Hashnode mirror)

Citation

If you use Hypa-Gemma4 E2B or any of the related work, please cite:

@misc{hypaai2026hypagemma4e2b,
  title        = {Hypa-Gemma4 E2B: A Multilingual Fine-Tune of Gemma 4 for Underrepresented Languages},
  author       = {{Hypa Intelligence}},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/hypaai/Hypa-Gemma4-E2B-v1}},
  note         = {Apache 2.0 License. Blog: \url{https://hypaintelligence.com/updates/tuning-gemma-4-for-multilingual-and-tool-aware-language-understanding}}
}

License

Released under the Apache License 2.0, inheriting the license of the Gemma 4 base model. Free to use, modify, and redistribute for both research and commercial purposes, with no monthly active user caps and no attribution friction.

Acknowledgments

Google DeepMind for releasing Gemma 4 under Apache 2.0 and for its 140+ language pretraining commitment.
Unsloth for day-zero Gemma 4 support and for handling the KV-sharing interaction with gradient checkpointing.
Runpod for reliable H200 infrastructure.
The language communities, speakers, and reviewers whose texts, voices, and feedback grounded this work and keep it honest.

Hypa Intelligence • Website • Hugging Face • GitHub • Blog

Multilingualism is not a feature. It is a prerequisite for AI that represents all of us.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for hypaai/Hypa-Gemma4-E2B-v1-LoRAs

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Adapter

(91)

this model

Dataset used to train hypaai/Hypa-Gemma4-E2B-v1-LoRAs

Collection including hypaai/Hypa-Gemma4-E2B-v1-LoRAs

Hypa-Gemma4

Collection

Multilingual Gemma 4 releases from Hypa Intelligence for low-resource and underrepresented language understanding. • 2 items • Updated 27 days ago