Instructions to use NightPrince/Qwen3-4B-Islamic-Arabic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NightPrince/Qwen3-4B-Islamic-Arabic with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="NightPrince/Qwen3-4B-Islamic-Arabic") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("NightPrince/Qwen3-4B-Islamic-Arabic") model = AutoModelForCausalLM.from_pretrained("NightPrince/Qwen3-4B-Islamic-Arabic") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use NightPrince/Qwen3-4B-Islamic-Arabic with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use NightPrince/Qwen3-4B-Islamic-Arabic with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NightPrince/Qwen3-4B-Islamic-Arabic" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NightPrince/Qwen3-4B-Islamic-Arabic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/NightPrince/Qwen3-4B-Islamic-Arabic
- SGLang
How to use NightPrince/Qwen3-4B-Islamic-Arabic with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "NightPrince/Qwen3-4B-Islamic-Arabic" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NightPrince/Qwen3-4B-Islamic-Arabic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "NightPrince/Qwen3-4B-Islamic-Arabic" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NightPrince/Qwen3-4B-Islamic-Arabic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use NightPrince/Qwen3-4B-Islamic-Arabic with Docker Model Runner:
docker model run hf.co/NightPrince/Qwen3-4B-Islamic-Arabic
Qwen3-4B-Islamic-Arabic
Qwen3-4B fine-tuned on Islamic Arabic Q&A via QLoRA — merged FP16, ready for direct inference.
This is the canonical, fully merged version of a Qwen3-4B model fine-tuned on 17,944 high-quality Islamic Arabic question-answer pairs spanning Fiqh, Fatwa, Aqeedah, Quran Sciences, and Islamic Finance. The LoRA adapter has been merged into the base weights and saved in FP16; no additional adapter loading is required.
Trained by Yahya Alnwsany (NightPrince) — 2026-05-05.
Model Variants
| Variant | Repo | Description |
|---|---|---|
| Merged FP16 (this model) | NightPrince/Qwen3-4B-Islamic-Arabic | Canonical merged model, FP16, ~7.6 GB — drop-in for transformers or vLLM |
| LoRA Adapter | NightPrince/Qwen3-4B-Islamic-Arabic-LoRA | PEFT adapter only, 264 MB — apply on top of Qwen/Qwen3-4B |
| INT4 Quantized | NightPrince/Qwen3-4B-Islamic-Arabic-INT4 | W4A16 compressed-tensors for fast vLLM serving, 2.5 GB |
| MLX 4-bit | NightPrince/Qwen3-4B-Islamic-Arabic-mlx-4Bit | Apple Silicon / MLX — native Mac inference, 4-bit quantized |
| GGUF | NightPrince/Qwen3-4B-Islamic-Arabic-GGUF | llama.cpp / Ollama / LM Studio — Q4_K_M (2.3 GB), Q8_0 (4.0 GB), F16 (7.5 GB) |
| Dataset | NightPrince/islamic-arabic-qa | 17,944 train / 2,101 val / 1,042 test — Islamic Arabic Q&A pairs |
Training Metrics
Loss Curve
| Checkpoint | Train Loss | Eval Loss |
|---|---|---|
| Step 0 (init) | — | — |
| Step 843 (final) | 1.8918 | 2.4094 (best) |
Token Accuracy
| Phase | Token Accuracy |
|---|---|
| Early training | ~50% |
| End of training | ~60% |
MCQ evaluation coming soon — a multiple-choice benchmark (Islamics domain) is prepared but requires serving the model via vLLM. Results will be posted here once available.
Usage
Transformers Inference
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "NightPrince/Qwen3-4B-Islamic-Arabic"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
)
SYSTEM_PROMPT = (
"أنت مساعد عالم إسلامي متخصص. "
"أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. "
"استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً."
)
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "ما حكم الزكاة على المال المدخر؟"},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
vLLM Serving
The merged FP16 model is ~7.6 GB. Use at least tensor_parallel_size=2 on 11 GB GPUs (e.g., RTX 2080 Ti), or a single 24 GB+ GPU.
# Install vLLM if needed
pip install vllm
# Serve with tensor parallelism across 2 GPUs
vllm serve NightPrince/Qwen3-4B-Islamic-Arabic \
--dtype float16 \
--tensor-parallel-size 2 \
--max-model-len 4096 \
--port 8000
Query the running server:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="token-abc123")
SYSTEM_PROMPT = (
"أنت مساعد عالم إسلامي متخصص. "
"أجب على الأسئلة بدقة استناداً إلى القرآن الكريم والسنة النبوية والفقه الإسلامي الكلاسيكي. "
"استشهد بالمصادر حيثما أمكن. كن موجزاً لكن شاملاً."
)
response = client.chat.completions.create(
model="NightPrince/Qwen3-4B-Islamic-Arabic",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "ما حكم الزكاة على المال المدخر؟"},
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)
Prefer lower memory? Use the INT4 quantized variant (2.5 GB) for vLLM or the GGUF variant for llama.cpp / Ollama.
Training Details
Dataset
| Property | Value |
|---|---|
| Dataset | NightPrince/islamic-arabic-qa |
| Train split | 17,944 samples |
| Validation split | 2,101 samples |
| Test split | 1,042 samples |
| Language | Arabic (Modern Standard + Classical) |
| Domains | Fiqh, Fatwa, Aqeedah, Quran Sciences, Islamic Finance |
| Quality filter | Applied — deduplication, length filtering, domain relevance scoring |
| Format | Instruction-following (system / user / assistant) |
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Epochs | 3 |
| Per-device batch size | 1 |
| Gradient accumulation steps | 16 |
| Effective batch size | 64 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine with warmup |
| Warmup ratio | 0.05 |
| Max sequence length | 1,024 tokens |
| Optimizer | AdamW (paged, 8-bit) |
| Precision | QLoRA (4-bit base + BF16 adapters) |
| Gradient checkpointing | Enabled |
| Loss masking | Assistant turns only (assistant_only_loss=True) |
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 64 |
| Alpha (α) | 128 |
| Dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable parameters | 132,120,576 |
| % of total parameters | 5.65% of 4.15B |
Results
| Metric | Value |
|---|---|
| Final train loss | 1.8918 |
| Best eval loss | 2.4094 |
| Total training steps | 843 |
| Training duration | 7.59 hours |
| Token accuracy (start → end) | ~50% → ~60% |
| MCQ benchmark | Coming soon (requires vLLM serving) |
Hardware
| Component | Specification |
|---|---|
| GPUs | 4× NVIDIA GeForce RTX 2080 Ti (11 GB VRAM each, 44 GB total) |
| CUDA version | 13.0 |
| Training framework | DDP via Hugging Face Accelerate |
Software Environment
| Library | Version |
|---|---|
| Python | 3.11.15 |
| PyTorch | 2.11.0+cu130 |
| Transformers | 4.57.6 |
| PEFT | 0.18.1 |
| TRL | 1.3.0 |
| BitsAndBytes | 0.49.2 |
| Accelerate | 1.13.0 |
Limitations
- Domain scope: The model is optimized for Islamic Arabic Q&A. General Arabic tasks or non-Islamic domains may show degraded quality compared to the base Qwen3-4B.
- Source attribution: While the model is trained to cite sources, citations should be independently verified — the model can produce plausible-sounding but incorrect references.
- Classical vs. contemporary Fiqh: The training data emphasizes classical scholarship. Contemporary jurisprudential debates, especially minority or regional opinions, may be underrepresented.
- Language: The model performs best in Arabic (Modern Standard and Classical). Responses in other languages are not guaranteed to be accurate or fluent.
Citation
@misc{alnwsany2026qwen3islamicarbic,
author = {Yahya Alnwsany},
title = {Qwen3-4B-Islamic-Arabic: QLoRA Fine-Tuning of Qwen3-4B on Islamic Arabic Q\&A},
year = {2026},
howpublished = {\url{https://huggingface.co/NightPrince/Qwen3-4B-Islamic-Arabic}},
note = {Base model: Qwen/Qwen3-4B. Dataset: NightPrince/islamic-arabic-qa.}
}
License
This model is released under the Apache 2.0 license, consistent with the base model Qwen/Qwen3-4B. See LICENSE for details.
- Downloads last month
- 199
Model tree for NightPrince/Qwen3-4B-Islamic-Arabic
Dataset used to train NightPrince/Qwen3-4B-Islamic-Arabic
Evaluation results
- Validation Loss on Islamic Arabic Q&Avalidation set self-reported2.409