linzhao-amd commited on
Commit
3828dcc
·
verified ·
1 Parent(s): 39e77c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -1
README.md CHANGED
@@ -1,6 +1,135 @@
1
  ---
2
  license: mit
 
 
3
  ---
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  # License
6
- Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.
 
1
  ---
2
  license: mit
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-R1
5
  ---
6
 
7
+
8
+ # Model Overview
9
+
10
+ - **Model Architecture:** DeepSeek-R1
11
+ - **Input:** Text
12
+ - **Output:** Text
13
+ - **Supported Hardware Microarchitecture:** AMD MI350/MI355
14
+ - **ROCm**: 7.0-Preview
15
+ - **Preferred Operating System(s):** Linux
16
+ - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
17
+ - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
18
+ - **Weight quantization:** OCP MXFP4
19
+ - **Activation quantization:** OCP MXFP4
20
+ - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
21
+
22
+ The model is the quantized version of the [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) model, which is an auto-regressive language model that uses an optimized transformer architecture. The MXFP4 model is quantized with [AMD-Quark](https://quark.docs.amd.com/latest/index.html).
23
+
24
+
25
+ # Model Quantization
26
+
27
+ This model was obtained by quantizing [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)'s weights and activations to MXFP4, using AutoSmoothQuant algorithm in [AMD-Quark](https://quark.docs.amd.com/latest/index.html).
28
+
29
+ **Quantization scripts:**
30
+ ```
31
+ # Dequantize the FP8 pretrained model to BFloat16, and then quantize the BFloat16 model using the following script.
32
+
33
+ cd Quark/examples/torch/language_modeling/llm_ptq/
34
+ python3 quantize_quark.py --model_dir $MODEL_DIR \
35
+ --quant_scheme w_mxfp4_a_mxfp4 \
36
+ --num_calib_data 128 \
37
+ --exclude_layers "*mlp.gate.*" "*lm_head" \
38
+ --multi_gpu \
39
+ --quant_algo autosmoothquant \
40
+ --model_export hf_format \
41
+ --output_dir amd/DeepSeek-R1-MXFP4
42
+ ```
43
+
44
+ # Deployment
45
+ ### Use with SGLang
46
+
47
+ This model can be deployed efficiently using the [SGLang](https://docs.sglang.ai/) backend.
48
+
49
+ ## Evaluation
50
+
51
+ The model was evaluated on AIME2024, GPQA Diamond, and GSM8K.
52
+ Evaluation was conducted using the framework [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the SGLang engine.
53
+
54
+ ### Accuracy
55
+
56
+ <table>
57
+ <tr>
58
+ <td><strong>Benchmark</strong>
59
+ </td>
60
+ <td><strong>DeepSeek-R1 </strong>
61
+ </td>
62
+ <td><strong>DeepSeek-R1-MXFP4(this model)</strong>
63
+ </td>
64
+ <td><strong>Recovery</strong>
65
+ </td>
66
+ </tr>
67
+ <tr>
68
+ <td>AIME2024
69
+ </td>
70
+ <td>78.00
71
+ </td>
72
+ <td>76.00
73
+ </td>
74
+ <td>97.44%
75
+ </td>
76
+ </tr>
77
+ <tr>
78
+ <td>GPQA Diamond
79
+ </td>
80
+ <td>68.89
81
+ </td>
82
+ <td>68.18
83
+ </td>
84
+ <td>98.97%
85
+ </td>
86
+ </tr> <tr>
87
+ <td>GSM8K
88
+ </td>
89
+ <td>95.81
90
+ </td>
91
+ <td>95.42
92
+ </td>
93
+ <td>99.59%
94
+ </td>
95
+ </tr>
96
+ </table>
97
+
98
+
99
+ ### Reproduction
100
+
101
+ The results were obtained using the following commands:
102
+
103
+ #### AIME2024
104
+ ```
105
+ python3 -m sglang.launch_server \
106
+ --model amd/DeepSeek-R1-MXFP4 \
107
+ --tp 8 \
108
+ --trust-remote-code \
109
+ --n-share-experts-fusion 8 \
110
+ --disable-radix-cache
111
+
112
+ lm_eval --model local-completions \
113
+ --model_args model=amd/DeepSeek-R1-MXFP4,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
114
+ --tasks aime24 \
115
+ --num_fewshot 0 \
116
+ --gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
117
+ --batch_size auto \
118
+ --log_samples \
119
+ --output_path output_data/DeepSeek-R1-MXFP4
120
+ ```
121
+
122
+ #### GSM8K
123
+ ```
124
+ lm_eval \
125
+ --model vllm \
126
+ --model_args pretrained=amd/DeepSeek-R1-MXFP4,dtype=auto,add_bos_token=True,tensor_parallel_size=$tp_size,gpu_memory_utilization=0.8,max_model_len=38768, \
127
+ --tasks gsm8k \
128
+ --num_fewshot 8 \
129
+ --batch_size auto \
130
+ --device cuda
131
+ ```
132
+
133
+
134
  # License
135
+ Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.