Instructions to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="hypaai/Hypa-Gemma4-E2B-v1-LoRAs") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("hypaai/Hypa-Gemma4-E2B-v1-LoRAs", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "hypaai/Hypa-Gemma4-E2B-v1-LoRAs" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hypaai/Hypa-Gemma4-E2B-v1-LoRAs", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/hypaai/Hypa-Gemma4-E2B-v1-LoRAs
- SGLang
How to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "hypaai/Hypa-Gemma4-E2B-v1-LoRAs" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hypaai/Hypa-Gemma4-E2B-v1-LoRAs", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "hypaai/Hypa-Gemma4-E2B-v1-LoRAs" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "hypaai/Hypa-Gemma4-E2B-v1-LoRAs", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for hypaai/Hypa-Gemma4-E2B-v1-LoRAs to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for hypaai/Hypa-Gemma4-E2B-v1-LoRAs to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for hypaai/Hypa-Gemma4-E2B-v1-LoRAs to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="hypaai/Hypa-Gemma4-E2B-v1-LoRAs", max_seq_length=2048, ) - Docker Model Runner
How to use hypaai/Hypa-Gemma4-E2B-v1-LoRAs with Docker Model Runner:
docker model run hf.co/hypaai/Hypa-Gemma4-E2B-v1-LoRAs
Hypa-Gemma4 E2B
A multilingual, tool-aware fine-tune of Google's Gemma 4 E2B for low-resource and underrepresented languages.
Model Description
Hypa-Gemma4 E2B (hypaai/Hypa-Gemma4-E2B-v1) is a LoRA-merged fine-tune of Google DeepMind's Gemma 4 E2B-it, produced by Hypa Intelligence. It is the first model released in our open research line on adapting modern open foundation models for low-resource and underrepresented languages, with a deliberate focus on retaining the base model's tool-aware and agentic prompting structure.
This release covers seventeen languages: English, French, Spanish, and fourteen languages of Nigeria. Several of the smaller languages in this set (including Annang, Eggon, Idoma, Igala, Nupe, and Urhobo) have not been formally represented in large-scale fine-tuning corpora before, or had no settled ISO-style language tag at the time we needed one.
The model is intended for translation, language detection, dictionary-style explanation, and general multilingual instruction-following. It inherits Gemma 4's native chat template, system / user / model role structure, and dedicated formatting for thinking and tool use.
| Property | Value |
|---|---|
| Base model | google/gemma-4-E2B-it (2.3B effective, 5.1B with PLE) |
| Method | LoRA (r=1024, α=1024) via Unsloth + QLoRA, then merged to 16-bit |
| Trainable parameters | 1.62B / 6.74B (24.04%) |
| Training data | 15.9M examples across 11 concatenated sub-datasets |
| Compute | 1× NVIDIA H200 SXM, 7.41 days |
| Languages | 17 |
| Context window | 128K (inherited from base model) |
| License | Apache 2.0 |
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "hypaai/Hypa-Gemma4-E2B-v1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are an expert Igbo translator."},
{"role": "user", "content": "Translate to Igbo: Good morning, how are you today?"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
return_tensors="pt",
add_generation_prompt=True,
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=256,
temperature=1.0,
top_p=0.95,
top_k=64,
do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
For thinking mode, pass enable_thinking=True to apply_chat_template. For tool-use payloads, use the standard Gemma 4 messages-with-tools format.
Languages Covered
| Code | Language | Code | Language |
|---|---|---|---|
en |
English | ibb |
Ibibio |
ann |
Annang | idm |
Idoma |
efi |
Efik | igl |
Igala |
ebi |
Ebira | ig |
Igbo |
ego |
Eggon | nup |
Nupe |
es |
Spanish | pg |
Pidgin |
fr |
French | tiv |
Tiv |
ha |
Hausa | urh |
Urhobo |
yo |
Yoruba |
Some of the smaller languages in this set required custom or non-standard tags because no widely-adopted machine-readable code existed at the time of training. Where ISO 639-3 codes were available, we used them; where they were not, we documented our internal codes in the data release so downstream users can reproduce splits.
Training Data
Training data comprises 15.9 million examples assembled from eleven Hypa Intelligence sub-datasets, each contributing a different signal:
- Synthetic_Dictionary_Text_ONLY, Synthetic_Dictionary_FF_CC_Text_ONLY, JSON_Dictionary_Text_ONLY — three views (prose, cloze, structured JSON) of a curated multilingual dictionary, providing lexical grounding across all target languages.
- Fleurs9000_Text_ONLY — ~9,000 text-only examples derived from FLEURS, providing parallel translations across the language set.
- CommonVoice_35k_Text_ONLY, CommonVoice_15k_Text_ONLY — transcript-only data drawn from Mozilla CommonVoice, providing real-world spoken-language distribution as text.
- cv_15k_translation, cv_15k_detection — CommonVoice transcripts reformulated into translation and language-detection tasks.
- cbp_translation, cbp_detection — parallel translation and detection pairs from a community-sourced parallel corpus with broad coverage of the smaller languages.
- gpt_oss_synthetic — general-purpose synthetic instruction data generated using open-source LLMs, included specifically to mitigate catastrophic forgetting of the base model's instruction-following ability.
A public 10k subset of the training data is released as hypaai/Hypa-Text-10k. Additional sub-datasets are progressively being released under the hypaai organization.
Prompt Formatting
Every example was formatted using Gemma 4's native chat template, with explicit system, user, and model roles and dedicated control tokens for thinking (<|think|>, <|channel>thought ... <channel|>) and tool use. Loss was computed only on assistant turns via train_on_responses_only with instruction_part="<|turn>user\n" and response_part="<|turn>model\n".
Training Procedure
| Hyperparameter | Value |
|---|---|
| LoRA rank (r) | 1024 |
| LoRA alpha (α) | 1024 |
| LoRA dropout | 0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Vision / audio modules | frozen |
| Quantization | 4-bit base, bf16 compute |
| Optimizer | AdamW 8-bit |
| Learning rate | 1e-5 |
| LR schedule | cosine, 500 warmup steps |
| Weight decay | 0.001 |
| Max grad norm | 1.0 |
| Per-device batch size | 32 |
| Gradient accumulation | 2 |
| Effective batch size | 64 |
| Sequence length | 2048 |
| Epochs | 1 |
| Total steps | 248,862 |
| Precision | bfloat16 |
| Gradient checkpointing | enabled (Unsloth) |
| Hardware | 1× NVIDIA H200 SXM (Runpod) |
| Runtime | 7.41 days |
| Compute (approx) | 2.11 × 10²⁰ FLOPs |
| Random seed | 3407 |
Training was performed using Unsloth, which provides day-zero Gemma 4 support and patches the shared-KV-cache interaction with gradient checkpointing that affects naive QLoRA setups in vanilla transformers.
Evaluation and Recommendations
Training metrics
- Final training loss: 0.41 (smooth monotonic decay)
- Best evaluation loss: 3.128 at step 40,000
- Final evaluation loss: 3.577
Honest note on overfitting
Evaluation loss bottomed out at step 40,000 (≈16% of total training) and rose steadily for the remaining ~209,000 steps, ending ~14% above its best value. The training loss continued to decrease throughout. This is a clear overfitting signature: the model memorized the training distribution beyond the point where additional optimization improved generalization.
For downstream use, we recommend the LoRA checkpoint at step 40,000 rather than the final merged 16-bit weights published in this repository. The intermediate checkpoint is available at hypaai/Hypa-Gemma4-E2B-v1-LoRAs. The merged 16-bit weights in this repo represent the end-of-training state and are useful as a reference but are not the strongest checkpoint by evaluation loss.
The full discussion of why this happened and what we will change for v2 is in the public write-up. Headline planned fixes: lower LoRA rank, load_best_model_at_end=True with eval_loss as the selection metric, language-held-out evaluation splits, and a lower step count.
Qualitative observations
Internal qualitative review on translation tasks shows substantial improvements over the base Gemma 4 E2B for every language in the set, with the largest deltas on the smallest languages (Annang, Efik, Ibibio), where the base model was effectively unusable. Quantitative chrF++, BLEU, and BLEURT results across language pairs will follow in a separate evaluation post.
Intended Use
Direct use cases:
- Translation between English / French / Spanish and the fourteen covered low-resource languages
- Language detection across all seventeen languages
- Dictionary-style lexical lookup and explanation
- Multilingual instruction-following on dialogue tasks
- Tool-aware / function-calling-style prompting (inheriting the base model's structure)
Downstream use:
- Suitable as a starting point for further fine-tuning on more specialized tasks within the supported languages
- Suitable for adapter stacking (e.g., domain-specific LoRA on top)
- Suitable for on-device deployment when quantized (E2B at 4-bit fits on mid-range mobile hardware)
Out-of-Scope and Limitations
- Not safety-tuned for sensitive domains. This model has not undergone RLHF or DPO post-training. It should not be used unsupervised for medical, legal, financial, or psychological-counseling applications.
- Quality varies by language. The smallest languages in the set are underrepresented even within our training mix and the resulting model output should be reviewed by native speakers before being used in production.
- Overfitting on the final checkpoint. As noted above, the merged 16-bit weights in this repository correspond to the end of training, not the best evaluation checkpoint. For applications that prioritize generalization, use the step-40,000 LoRA from the companion repository.
- Vision and audio components are frozen. This is a text-only fine-tune. The base model's image and audio capabilities are preserved at the underlying weight level but were not exercised during training and have not been validated for our target languages.
- Tokenization quality. Gemma 4's 256K-vocabulary tokenizer fragments low-resource languages less aggressively than smaller-vocabulary tokenizers, but the smallest languages in this release still tokenize at higher cost per character than English. This is the gap we expect future iterations to close.
- Coverage is finite. The seventeen languages in this release are the start, not the end. Many other underrepresented languages are not yet supported and may produce unreliable output.
Bias, Risks, and Limitations
This model inherits the biases and limitations of its base model (Google Gemma 4) and adds the biases of its fine-tuning corpus, which is weighted toward dictionary, religious-parallel, and CommonVoice text. Religious-parallel text in particular is a known cause of register and content bias in low-resource translation models. Users deploying this model in customer-facing applications should evaluate output for cultural appropriateness in their specific use case and language.
The model is not intended to make decisions affecting people's rights, health, finances, or wellbeing. Like all language models, it can produce confident-sounding output that is incorrect, particularly on the smallest languages where training data was thinnest.
Released Artifacts
- 🤗 Merged 16-bit model (this repo):
hypaai/Hypa-Gemma4-E2B-v1 - 🤗 LoRA adapter checkpoints:
hypaai/Hypa-Gemma4-E2B-v1-LoRAs - 📊 TensorBoard metrics: view on HF
- 📦 Public training data subset:
hypaai/Hypa-Text-10k - 💻 GitHub repository:
hypaai/Hypa-Gemma - 📝 Blog post: Tuning Gemma 4 for multilingual and tool-aware language understanding (Hashnode mirror)
Citation
If you use Hypa-Gemma4 E2B or any of the related work, please cite:
@misc{hypaai2026hypagemma4e2b,
title = {Hypa-Gemma4 E2B: A Multilingual Fine-Tune of Gemma 4 for Underrepresented Languages},
author = {{Hypa Intelligence}},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/hypaai/Hypa-Gemma4-E2B-v1}},
note = {Apache 2.0 License. Blog: \url{https://hypaintelligence.com/updates/tuning-gemma-4-for-multilingual-and-tool-aware-language-understanding}}
}
License
Released under the Apache License 2.0, inheriting the license of the Gemma 4 base model. Free to use, modify, and redistribute for both research and commercial purposes, with no monthly active user caps and no attribution friction.
Acknowledgments
- Google DeepMind for releasing Gemma 4 under Apache 2.0 and for its 140+ language pretraining commitment.
- Unsloth for day-zero Gemma 4 support and for handling the KV-sharing interaction with gradient checkpointing.
- Runpod for reliable H200 infrastructure.
- The language communities, speakers, and reviewers whose texts, voices, and feedback grounded this work and keep it honest.
Hypa Intelligence • Website • Hugging Face • GitHub • Blog
Multilingualism is not a feature. It is a prerequisite for AI that represents all of us.