YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Sarvam-30B 4-Bit (BitsAndBytes)
This repository provides a 4-bit NF4 quantized version of the base model sarvamai/sarvam-30b using bitsandbytes.
The quantization significantly reduces GPU memory usage while preserving strong inference performance.
Base model
sarvamai/sarvam-30b
Architecture
SarvamMoEForCausalLM
Quantization Details
Quantization method: BitsAndBytes 4-bit (NF4)
Configuration used:
- load_in_4bit = True
- bnb_4bit_quant_type = nf4
- bnb_4bit_compute_dtype = float16
- bnb_4bit_use_double_quant = True
Approximate GPU memory usage:
| Model | GPU VRAM |
|---|---|
| FP16 original | ~60 GB |
| 4-bit NF4 | ~16-18 GB |
This version is recommended for most users who want to run the model with reduced hardware requirements.
Installation
Install the required libraries.
pip install transformers accelerate bitsandbytes torch safetensors
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"neuralnets/sarvam-30b-4bit",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"neuralnets/sarvam-30b-4bit",
trust_remote_code=True
)
Example Inference
prompt = "Explain mixture of experts in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Hardware Requirements
Recommended GPUs:
- A100 40GB or 80GB
- RTX 4090
- RTX 3090 (with offloading)
CPU RAM recommendation:
- 32 GB or higher
Notes
- This model uses bitsandbytes quantization integrated into Hugging Face Transformers.
- The Sarvam architecture requires
trust_remote_code=True. - Designed primarily for inference workloads.
Base Model
Original model:
sarvamai/sarvam-30b
Please refer to the base repository for model training details and benchmarks.
License
This repository distributes a quantized derivative of the original model.
Users must follow the license of the upstream model:
sarvamai/sarvam-30b
- Downloads last month
- 70