tngtech
/

DeepSeek-TNG-R1T2-Chimera

Text Generation

text-generation-inference

Model card Files Files and versions

TNGHK commited on Aug 13, 2025

Commit

2827249

·

verified ·

1 Parent(s): cf3c8de

Update README.md

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -61,14 +61,14 @@ R1T2 operates at a new sweet spot in intelligence vs. output token length. It ap
 Evaluation was performed using the evalchemy framework (pass@1 averaged over 10/5 runs for AIME/GPQAD, at a temperature of 0.6).
 We report measured benchmark results for our R1T2, R1T models and published benchmark results for V3-0324, R1, R1-0528.
-|                                    | R1T2 |  R1T | V3-0324 |   R1 | R1-0528 | Comment |
-|:-----------------------------------|-----:|-----:|--------:|-----:|--------:|:--------|
-| AIME-24                            | 82.3 | 74.7 |    59.4 | 79.8 |    91.4 |         |
-| AIME-25                            | 70.0 | 58.3 |    49.6 | 70.0 |    87.5 | V3-0324 source: AIME-25 measured by us |
-| GPQA-Diamond                       | 77.9 | 72.0 |    68.4 | 71.5 |    81.0 |         |
-| Aider Polyglot                     | 64.4 | 48.4 |    44.9 | 52.0 |    71.6 | R1T2 source: Aider discord, t=0.75 |
-| EQ-Bench Longform Creative Writing | 76.4 |  ./. |    78.1 | 74.6 |    78.9 | see [EQ Bench](https://eqbench.com/creative_writing_longform.html)  |
-| Vectara Hallucination Rate         |  5.5 |  ./. |     8.0 | 14.3 |     7.7 | see [Hallucination Leaderboard](https://github.com/vectara/hallucination-leaderboard), lower hallucination rates are better |
 ## Technological background

 Evaluation was performed using the evalchemy framework (pass@1 averaged over 10/5 runs for AIME/GPQAD, at a temperature of 0.6).
 We report measured benchmark results for our R1T2, R1T models and published benchmark results for V3-0324, R1, R1-0528.
+|                                    | R1T2 |  R1T | V3-0324 |   R1 | R1-0528 | Comment | Special source |
+|:-----------------------------------|-----:|-----:|--------:|-----:|--------:|:--------|:--------|
+| AIME-24                            | 82.3 | 74.7 |    59.4 | 79.8 |    91.4 |         |         |
+| AIME-25                            | 70.0 | 58.3 |    49.6 | 70.0 |    87.5 |         | V3-0324 AIME-25 measured by us |
+| GPQA-Diamond                       | 77.9 | 72.0 |    68.4 | 71.5 |    81.0 |         |         |
+| Aider Polyglot                     | 64.4 | 48.4 |    44.9 | 52.0 |    71.6 | R1T2 beats two of its parents, V3-0324 and R1, and was measured to be about 2.2 times more token efficient, i.e. faster, than its third parent, R1-0528 | R1T2 source: Aider discord, t=0.75 |
+| EQ-Bench Longform Creative Writing | 76.4 |  ./. |    78.1 | 74.6 |    78.9 | EQ Bench version before August 8th, 2025 | see [EQ Bench](https://eqbench.com/creative_writing_longform.html)  |
+| Vectara Hallucination Rate         |  5.5 |  ./. |     8.0 | 14.3 |     7.7 | lower hallucination rates are better, R1T2 is better than all its three parents | see [Hallucination Leaderboard](https://github.com/vectara/hallucination-leaderboard) |
 ## Technological background