Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ base_model:
|
|
| 12 |
- **Output:** Text
|
| 13 |
- **Supported Hardware Microarchitecture:** AMD MI350/MI355
|
| 14 |
- **ROCm**: 7.0-Preview
|
| 15 |
-
- **
|
| 16 |
- **Inference Engine:** [SGLang](https://docs.sglang.ai/)
|
| 17 |
- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
|
| 18 |
- **Weight quantization:** OCP MXFP4
|
|
@@ -33,6 +33,7 @@ This model was obtained by quantizing [DeepSeek-R1](https://huggingface.co/deeps
|
|
| 33 |
cd Quark/examples/torch/language_modeling/llm_ptq/
|
| 34 |
python3 quantize_quark.py --model_dir $MODEL_DIR \
|
| 35 |
--quant_scheme w_mxfp4_a_mxfp4 \
|
|
|
|
| 36 |
--num_calib_data 128 \
|
| 37 |
--exclude_layers "*mlp.gate.*" "*lm_head" \
|
| 38 |
--multi_gpu \
|
|
|
|
| 12 |
- **Output:** Text
|
| 13 |
- **Supported Hardware Microarchitecture:** AMD MI350/MI355
|
| 14 |
- **ROCm**: 7.0-Preview
|
| 15 |
+
- **Operating System(s):** Linux
|
| 16 |
- **Inference Engine:** [SGLang](https://docs.sglang.ai/)
|
| 17 |
- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
|
| 18 |
- **Weight quantization:** OCP MXFP4
|
|
|
|
| 33 |
cd Quark/examples/torch/language_modeling/llm_ptq/
|
| 34 |
python3 quantize_quark.py --model_dir $MODEL_DIR \
|
| 35 |
--quant_scheme w_mxfp4_a_mxfp4 \
|
| 36 |
+
--group_size 32 \
|
| 37 |
--num_calib_data 128 \
|
| 38 |
--exclude_layers "*mlp.gate.*" "*lm_head" \
|
| 39 |
--multi_gpu \
|