sarvam-105b-AWQ

Model Overview

Model Architecture: sarvamai/sarvam-105b
- Input: Text
- Output: Text
Model Optimizations:
- Weight quantization: AWQ
Out-of-scope: Use in any manner that violates applicable laws or regulations (including trade compliance laws).
Version: 1.0
Model Developers: QuantTrio

This model is quantized using llm-compressor. Calibration dataset sarvamai/indivibe

Deployment

Use with vLLM

This model can be deployed efficiently using the vLLM backend.

1: Hot-patch (easy)

Run hotpatch_vllm.py This will do the following:

install vllm=0.15.0
add 2 model entries to registry.py
download the model executors for sarvam-105b

2: Run vLLM

export OMP_NUM_THREADS=4

vllm serve 
    __YOUR_PATH__/QuantTrio/sarvam-105b-AWQ \
    --served-model-name MY_MODEL \
    --swap-space 16 \
    --max-num-seqs 32 \
    --max-model-len 32768  \
    --gpu-memory-utilization 0.9 \
    --tensor-parallel-size 4 \
    --enable-auto-tool-choice \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 8000

Model Files

File Size	Last Updated
`74GiB`	`2026-03-12`

Logs

2026-03-12
1. Initial commit

Downloads last month: 13

Safetensors

Model size

19B params

Tensor type

F32

I64

I32

Model tree for QuantTrio/sarvam-105b-AWQ

Base model

sarvamai/sarvam-105b

Quantized

(5)

this model