amd
/

DeepSeek-R1-MXFP4-Preview

8-bit precision

Model card Files Files and versions

linzhao-amd commited on Aug 1

Commit

469c869

·

verified ·

1 Parent(s): b35e9d0

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ base_model:
   - **Output:** Text
 - **Supported Hardware Microarchitecture:** AMD MI350/MI355
 - **ROCm**: 7.0-Preview
-- **Preferred Operating System(s):** Linux
 - **Inference Engine:** [SGLang](https://docs.sglang.ai/)
 - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
   - **Weight quantization:** OCP MXFP4
@@ -33,6 +33,7 @@ This model was obtained by quantizing [DeepSeek-R1](https://huggingface.co/deeps
 cd Quark/examples/torch/language_modeling/llm_ptq/
 python3 quantize_quark.py --model_dir $MODEL_DIR \
                           --quant_scheme w_mxfp4_a_mxfp4 \
                           --num_calib_data 128 \
                           --exclude_layers "*mlp.gate.*" "*lm_head" \
                           --multi_gpu \

   - **Output:** Text
 - **Supported Hardware Microarchitecture:** AMD MI350/MI355
 - **ROCm**: 7.0-Preview
+- **Operating System(s):** Linux
 - **Inference Engine:** [SGLang](https://docs.sglang.ai/)
 - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
   - **Weight quantization:** OCP MXFP4
 cd Quark/examples/torch/language_modeling/llm_ptq/
 python3 quantize_quark.py --model_dir $MODEL_DIR \
                           --quant_scheme w_mxfp4_a_mxfp4 \
+                          --group_size 32 \
                           --num_calib_data 128 \
                           --exclude_layers "*mlp.gate.*" "*lm_head" \
                           --multi_gpu \