amd
/

DeepSeek-R1-MXFP4-Preview

 ---
 license: mit
+base_model:
+- deepseek-ai/DeepSeek-R1
 ---
+# Model Overview
+- **Model Architecture:** DeepSeek-R1
+  - **Input:** Text
+  - **Output:** Text
+- **Supported Hardware Microarchitecture:** AMD MI350/MI355
+- **ROCm**: 7.0-Preview
+- **Preferred Operating System(s):** Linux
+- **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
+- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
+  - **Weight quantization:** OCP MXFP4
+  - **Activation quantization:** OCP MXFP4
+- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
+The model is the quantized version of the [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) model, which is an auto-regressive language model that uses an optimized transformer architecture. The MXFP4 model is quantized with [AMD-Quark](https://quark.docs.amd.com/latest/index.html).
+# Model Quantization
+This model was obtained by quantizing [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)'s weights and activations to MXFP4, using AutoSmoothQuant algorithm in [AMD-Quark](https://quark.docs.amd.com/latest/index.html).
+**Quantization scripts:**
+```
+# Dequantize the FP8 pretrained model to BFloat16, and then quantize the BFloat16 model using the following script.
+cd Quark/examples/torch/language_modeling/llm_ptq/
+python3 quantize_quark.py --model_dir $MODEL_DIR \
+                          --quant_scheme w_mxfp4_a_mxfp4 \
+                          --num_calib_data 128 \
+                          --exclude_layers "*mlp.gate.*" "*lm_head" \
+                          --multi_gpu \
+                          --quant_algo autosmoothquant \
+                          --model_export hf_format \
+                          --output_dir amd/DeepSeek-R1-MXFP4
+```
+# Deployment
+### Use with SGLang
+This model can be deployed efficiently using the [SGLang](https://docs.sglang.ai/) backend.
+## Evaluation
+The model was evaluated on AIME2024, GPQA Diamond, and GSM8K.
+Evaluation was conducted using the framework [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the SGLang engine.
+### Accuracy
+<table>
+  <tr>
+   <td><strong>Benchmark</strong>
+   </td>
+   <td><strong>DeepSeek-R1 </strong>
+   </td>
+   <td><strong>DeepSeek-R1-MXFP4(this model)</strong>
+   </td>
+   <td><strong>Recovery</strong>
+   </td>
+  </tr>
+  <tr>
+   <td>AIME2024
+   </td>
+   <td>78.00
+   </td>
+   <td>76.00
+   </td>
+   <td>97.44%
+   </td>
+  </tr>
+  <tr>
+   <td>GPQA Diamond
+   </td>
+   <td>68.89
+   </td>
+   <td>68.18
+   </td>
+   <td>98.97%
+   </td>
+  </tr>  <tr>
+   <td>GSM8K
+   </td>
+   <td>95.81
+   </td>
+   <td>95.42
+   </td>
+   <td>99.59%
+   </td>
+  </tr>
+</table>
+### Reproduction
+The results were obtained using the following commands:
+#### AIME2024
+```
+python3 -m sglang.launch_server \
+    --model amd/DeepSeek-R1-MXFP4 \
+    --tp 8  \
+    --trust-remote-code  \
+    --n-share-experts-fusion 8 \
+    --disable-radix-cache
+lm_eval --model local-completions \
+    --model_args model=amd/DeepSeek-R1-MXFP4,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
+    --tasks aime24 \
+    --num_fewshot 0 \
+    --gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
+    --batch_size auto \
+    --log_samples \
+    --output_path output_data/DeepSeek-R1-MXFP4
+```
+#### GSM8K
+```
+lm_eval \
+    --model vllm \
+    --model_args pretrained=amd/DeepSeek-R1-MXFP4,dtype=auto,add_bos_token=True,tensor_parallel_size=$tp_size,gpu_memory_utilization=0.8,max_model_len=38768, \
+    --tasks gsm8k \
+    --num_fewshot 8 \
+    --batch_size auto \
+    --device cuda
+```
 # License
+Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.