linzhao-amd commited on
Commit
cceef5d
·
verified ·
1 Parent(s): f8a62be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -20
README.md CHANGED
@@ -60,7 +60,7 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
60
  </td>
61
  <td><strong>DeepSeek-R1 </strong>
62
  </td>
63
- <td><strong>DeepSeek-R1-MXFP4-ASQ(this model)</strong>
64
  </td>
65
  <td><strong>Recovery</strong>
66
  </td>
@@ -70,9 +70,9 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
70
  </td>
71
  <td>78.0
72
  </td>
73
- <td>76.0
74
  </td>
75
- <td>97.44%
76
  </td>
77
  </tr>
78
  <tr>
@@ -90,9 +90,9 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
90
  </td>
91
  <td>95.81
92
  </td>
93
- <td>95.42
94
  </td>
95
- <td>99.59%
96
  </td>
97
  </tr>
98
  </table>
@@ -100,12 +100,12 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
100
 
101
  ### Reproduction
102
 
103
- The results were obtained using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) with native evaluation task GSM8K, and custom task AIME24.
104
 
105
  ### AIME24
106
  ```
107
  lm_eval --model local-completions \
108
- --model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
109
  --tasks aime24 \
110
  --num_fewshot 0 \
111
  --gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
@@ -114,22 +114,10 @@ lm_eval --model local-completions \
114
  --output_path output_data/aime24 2>&1 | tee logs/aime24.log
115
  ```
116
 
117
- ### MMLU_COT
118
- ```
119
- lm_eval --model local-completions \
120
- --model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
121
- --tasks mmlu_cot \
122
- --num_fewshot 0 \
123
- --gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
124
- --batch_size auto \
125
- --log_samples \
126
- --output_path output_data/mmmlu_cot 2>&1 | tee logs/mmmlu_cot.log
127
- ```
128
-
129
  ### GSM8K
130
  ```
131
  lm_eval --model local-completions \
132
- --model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=8096 \
133
  --tasks gsm8k \
134
  --num_fewshot 5 \
135
  --batch_size auto \
 
60
  </td>
61
  <td><strong>DeepSeek-R1 </strong>
62
  </td>
63
+ <td><strong>DeepSeek-R1-MXFP4-Preview(this model)</strong>
64
  </td>
65
  <td><strong>Recovery</strong>
66
  </td>
 
70
  </td>
71
  <td>78.0
72
  </td>
73
+ <td>69.57
74
  </td>
75
+ <td>89.19%
76
  </td>
77
  </tr>
78
  <tr>
 
90
  </td>
91
  <td>95.81
92
  </td>
93
+ <td>93.95
94
  </td>
95
+ <td>98.05%
96
  </td>
97
  </tr>
98
  </table>
 
100
 
101
  ### Reproduction
102
 
103
+ The results were obtained using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) with custom evaluation task AIME24 and native task GSM8K.
104
 
105
  ### AIME24
106
  ```
107
  lm_eval --model local-completions \
108
+ --model_args model=amd/DeepSeek-R1-MXFP4-Preview,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
109
  --tasks aime24 \
110
  --num_fewshot 0 \
111
  --gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
 
114
  --output_path output_data/aime24 2>&1 | tee logs/aime24.log
115
  ```
116
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  ### GSM8K
118
  ```
119
  lm_eval --model local-completions \
120
+ --model_args model=amd/DeepSeek-R1-MXFP4-Preview,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=8096 \
121
  --tasks gsm8k \
122
  --num_fewshot 5 \
123
  --batch_size auto \