--- license: mit base_model: - deepseek-ai/DeepSeek-R1 --- # Model Overview - **Model Architecture:** DeepSeek-R1 - **Input:** Text - **Output:** Text - **Supported Hardware Microarchitecture:** AMD MI350/MI355 - **ROCm**: 7.0-Preview - **Preferred Operating System(s):** Linux - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/) - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) - **Weight quantization:** OCP MXFP4 - **Activation quantization:** OCP MXFP4 - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup) The model is the quantized version of the [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) model, which is an auto-regressive language model that uses an optimized transformer architecture. The MXFP4 model is quantized with [AMD-Quark](https://quark.docs.amd.com/latest/index.html). # Model Quantization This model was obtained by quantizing [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)'s weights and activations to MXFP4, using AutoSmoothQuant algorithm in [AMD-Quark](https://quark.docs.amd.com/latest/index.html). **Quantization scripts:** ``` # Dequantize the FP8 pretrained model to BFloat16, and then quantize the BFloat16 model using the following script. cd Quark/examples/torch/language_modeling/llm_ptq/ python3 quantize_quark.py --model_dir $MODEL_DIR \ --quant_scheme w_mxfp4_a_mxfp4 \ --num_calib_data 128 \ --exclude_layers "*mlp.gate.*" "*lm_head" \ --multi_gpu \ --quant_algo autosmoothquant \ --model_export hf_format \ --output_dir amd/DeepSeek-R1-MXFP4 ``` # Deployment ### Use with SGLang This model can be deployed efficiently using the [SGLang](https://docs.sglang.ai/) backend. ## Evaluation The model was evaluated on AIME2024, GPQA Diamond, and GSM8K. Evaluation was conducted using the framework [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the SGLang engine. ### Accuracy
Benchmark DeepSeek-R1 DeepSeek-R1-MXFP4(this model) Recovery
AIME2024 78.00 76.00 97.44%
GPQA Diamond 68.89 68.18 98.97%
GSM8K 95.81 95.42 99.59%
### Reproduction The results were obtained using the following commands: #### AIME2024 ``` python3 -m sglang.launch_server \ --model amd/DeepSeek-R1-MXFP4 \ --tp 8 \ --trust-remote-code \ --n-share-experts-fusion 8 \ --disable-radix-cache lm_eval --model local-completions \ --model_args model=amd/DeepSeek-R1-MXFP4,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \ --tasks aime24 \ --num_fewshot 0 \ --gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \ --batch_size auto \ --log_samples \ --output_path output_data/DeepSeek-R1-MXFP4 ``` #### GSM8K ``` lm_eval \ --model vllm \ --model_args pretrained=amd/DeepSeek-R1-MXFP4,dtype=auto,add_bos_token=True,tensor_parallel_size=$tp_size,gpu_memory_utilization=0.8,max_model_len=38768, \ --tasks gsm8k \ --num_fewshot 8 \ --batch_size auto \ --device cuda ``` # License Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.