Update README.md
Browse files
README.md
CHANGED
|
@@ -60,7 +60,7 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
|
|
| 60 |
</td>
|
| 61 |
<td><strong>DeepSeek-R1 </strong>
|
| 62 |
</td>
|
| 63 |
-
<td><strong>DeepSeek-R1-MXFP4-
|
| 64 |
</td>
|
| 65 |
<td><strong>Recovery</strong>
|
| 66 |
</td>
|
|
@@ -70,9 +70,9 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
|
|
| 70 |
</td>
|
| 71 |
<td>78.0
|
| 72 |
</td>
|
| 73 |
-
<td>
|
| 74 |
</td>
|
| 75 |
-
<td>
|
| 76 |
</td>
|
| 77 |
</tr>
|
| 78 |
<tr>
|
|
@@ -90,9 +90,9 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
|
|
| 90 |
</td>
|
| 91 |
<td>95.81
|
| 92 |
</td>
|
| 93 |
-
<td>95
|
| 94 |
</td>
|
| 95 |
-
<td>
|
| 96 |
</td>
|
| 97 |
</tr>
|
| 98 |
</table>
|
|
@@ -100,12 +100,12 @@ The model was evaluated using [SGLang](https://docs.sglang.ai/) and [lm-evaluati
|
|
| 100 |
|
| 101 |
### Reproduction
|
| 102 |
|
| 103 |
-
The results were obtained using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) with
|
| 104 |
|
| 105 |
### AIME24
|
| 106 |
```
|
| 107 |
lm_eval --model local-completions \
|
| 108 |
-
--model_args model=amd/DeepSeek-R1-MXFP4-
|
| 109 |
--tasks aime24 \
|
| 110 |
--num_fewshot 0 \
|
| 111 |
--gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
|
|
@@ -114,22 +114,10 @@ lm_eval --model local-completions \
|
|
| 114 |
--output_path output_data/aime24 2>&1 | tee logs/aime24.log
|
| 115 |
```
|
| 116 |
|
| 117 |
-
### MMLU_COT
|
| 118 |
-
```
|
| 119 |
-
lm_eval --model local-completions \
|
| 120 |
-
--model_args model=amd/DeepSeek-R1-MXFP4-ASQ,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
|
| 121 |
-
--tasks mmlu_cot \
|
| 122 |
-
--num_fewshot 0 \
|
| 123 |
-
--gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
|
| 124 |
-
--batch_size auto \
|
| 125 |
-
--log_samples \
|
| 126 |
-
--output_path output_data/mmmlu_cot 2>&1 | tee logs/mmmlu_cot.log
|
| 127 |
-
```
|
| 128 |
-
|
| 129 |
### GSM8K
|
| 130 |
```
|
| 131 |
lm_eval --model local-completions \
|
| 132 |
-
--model_args model=amd/DeepSeek-R1-MXFP4-
|
| 133 |
--tasks gsm8k \
|
| 134 |
--num_fewshot 5 \
|
| 135 |
--batch_size auto \
|
|
|
|
| 60 |
</td>
|
| 61 |
<td><strong>DeepSeek-R1 </strong>
|
| 62 |
</td>
|
| 63 |
+
<td><strong>DeepSeek-R1-MXFP4-Preview(this model)</strong>
|
| 64 |
</td>
|
| 65 |
<td><strong>Recovery</strong>
|
| 66 |
</td>
|
|
|
|
| 70 |
</td>
|
| 71 |
<td>78.0
|
| 72 |
</td>
|
| 73 |
+
<td>69.57
|
| 74 |
</td>
|
| 75 |
+
<td>89.19%
|
| 76 |
</td>
|
| 77 |
</tr>
|
| 78 |
<tr>
|
|
|
|
| 90 |
</td>
|
| 91 |
<td>95.81
|
| 92 |
</td>
|
| 93 |
+
<td>93.95
|
| 94 |
</td>
|
| 95 |
+
<td>98.05%
|
| 96 |
</td>
|
| 97 |
</tr>
|
| 98 |
</table>
|
|
|
|
| 100 |
|
| 101 |
### Reproduction
|
| 102 |
|
| 103 |
+
The results were obtained using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) with custom evaluation task AIME24 and native task GSM8K.
|
| 104 |
|
| 105 |
### AIME24
|
| 106 |
```
|
| 107 |
lm_eval --model local-completions \
|
| 108 |
+
--model_args model=amd/DeepSeek-R1-MXFP4-Preview,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
|
| 109 |
--tasks aime24 \
|
| 110 |
--num_fewshot 0 \
|
| 111 |
--gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
|
|
|
|
| 114 |
--output_path output_data/aime24 2>&1 | tee logs/aime24.log
|
| 115 |
```
|
| 116 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 117 |
### GSM8K
|
| 118 |
```
|
| 119 |
lm_eval --model local-completions \
|
| 120 |
+
--model_args model=amd/DeepSeek-R1-MXFP4-Preview,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=8096 \
|
| 121 |
--tasks gsm8k \
|
| 122 |
--num_fewshot 5 \
|
| 123 |
--batch_size auto \
|