--- license: mit base_model: - deepseek-ai/DeepSeek-R1 --- # Model Overview - **Model Architecture:** DeepSeek-R1 - **Input:** Text - **Output:** Text - **Supported Hardware Microarchitecture:** AMD MI350/MI355 - **ROCm**: 7.0-Preview - **Preferred Operating System(s):** Linux - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/) - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) - **Weight quantization:** OCP MXFP4 - **Activation quantization:** OCP MXFP4 - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup) The model is the quantized version of the [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) model, which is an auto-regressive language model that uses an optimized transformer architecture. The MXFP4 model is quantized with [AMD-Quark](https://quark.docs.amd.com/latest/index.html). # Model Quantization This model was obtained by quantizing [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)'s weights and activations to MXFP4, using AutoSmoothQuant algorithm in [AMD-Quark](https://quark.docs.amd.com/latest/index.html). **Quantization scripts:** ``` # Dequantize the FP8 pretrained model to BFloat16, and then quantize the BFloat16 model using the following script. cd Quark/examples/torch/language_modeling/llm_ptq/ python3 quantize_quark.py --model_dir $MODEL_DIR \ --quant_scheme w_mxfp4_a_mxfp4 \ --num_calib_data 128 \ --exclude_layers "*mlp.gate.*" "*lm_head" \ --multi_gpu \ --quant_algo autosmoothquant \ --model_export hf_format \ --output_dir amd/DeepSeek-R1-MXFP4 ``` # Deployment ### Use with SGLang This model can be deployed efficiently using the [SGLang](https://docs.sglang.ai/) backend. ## Evaluation The model was evaluated on AIME2024, GPQA Diamond, and GSM8K. Evaluation was conducted using the framework [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the SGLang engine. ### Accuracy
| Benchmark | DeepSeek-R1 | DeepSeek-R1-MXFP4(this model) | Recovery |
| AIME2024 | 78.00 | 76.00 | 97.44% |
| GPQA Diamond | 68.89 | 68.18 | 98.97% |
| GSM8K | 95.81 | 95.42 | 99.59% |