amd
/

DeepSeek-R1-MXFP4-Preview

@@ -60,7 +60,7 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
    </td>
    <td><strong>DeepSeek-R1 </strong>
    </td>
-   <td><strong>DeepSeek-R1-MXFP4-ASQ(this model)</strong>
    </td>
    <td><strong>Recovery</strong>
    </td>
@@ -70,9 +70,9 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
    </td>
    <td>78.0
    </td>
-   <td>76.0
    </td>
-   <td>97.44%
    </td>
   </tr>
   <tr>
@@ -90,9 +90,9 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
    </td>
    <td>95.81
    </td>
-   <td>95.42
    </td>
-   <td>99.59%
    </td>
   </tr>
 </table>
@@ -100,12 +100,12 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
 ### Reproduction
-The results were obtained using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) with native evaluation task GSM8K, and custom task AIME24.
 ### AIME24
 ```
 lm_eval --model local-completions \
-    --model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
     --tasks aime24 \
     --num_fewshot 0 \
     --gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
@@ -114,22 +114,10 @@ lm_eval --model local-completions \
     --output_path output_data/aime24 2>&1 | tee logs/aime24.log
 ```
-### MMLU_COT
-```
-lm_eval --model local-completions \
-    --model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
-    --tasks mmlu_cot \
-    --num_fewshot 0 \
-    --gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
-    --batch_size auto \
-    --log_samples \
-    --output_path output_data/mmmlu_cot 2>&1 | tee logs/mmmlu_cot.log
-```
 ### GSM8K
 ```
 lm_eval --model local-completions \
-    --model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=8096 \
     --tasks gsm8k \
     --num_fewshot 5 \
     --batch_size auto \

    </td>
    <td><strong>DeepSeek-R1 </strong>
    </td>
+   <td><strong>DeepSeek-R1-MXFP4-Preview(this model)</strong>
    </td>
    <td><strong>Recovery</strong>
    </td>
    </td>
    <td>78.0
    </td>
+   <td>69.57
    </td>
+   <td>89.19%
    </td>
   </tr>
   <tr>
    </td>
    <td>95.81
    </td>
+   <td>93.95
    </td>
+   <td>98.05%
    </td>
   </tr>
 </table>
 ### Reproduction
+The results were obtained using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) with custom evaluation task AIME24 and native task GSM8K.
 ### AIME24
 ```
 lm_eval --model local-completions \
+    --model_args model=amd/DeepSeek-R1-MXFP4-Preview,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
     --tasks aime24 \
     --num_fewshot 0 \
     --gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
     --output_path output_data/aime24 2>&1 | tee logs/aime24.log
 ```
 ### GSM8K
 ```
 lm_eval --model local-completions \
+    --model_args model=amd/DeepSeek-R1-MXFP4-Preview,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=8096 \
     --tasks gsm8k \
     --num_fewshot 5 \
     --batch_size auto \