| | --- |
| | library_name: transformers |
| | tags: |
| | - trl |
| | - grpo |
| | - rl |
| | - superthoughts |
| | - reasoning |
| | - cot |
| | license: apache-2.0 |
| | datasets: |
| | - openai/gsm8k |
| | - Pinkstack/intructions-sft-sharegpt |
| | language: |
| | - en |
| | base_model: |
| | - HuggingFaceTB/SmolLM2-1.7B-Instruct |
| | pipeline_tag: text-generation |
| | widget: |
| | - messages: |
| | - role: user |
| | content: You must act in a conversational matter and always include at the start <think> ... </think> <output> ... </output> tokens.\nHow many R's in strawberry? |
| | - messages: |
| | - role: user |
| | content: You must act in a conversational matter and always include at the start <think> ... </think> <output> ... </output> tokens.\nWhat are you? |
| | - messages: |
| | - role: user |
| | content: You must act in a conversational matter and always include at the start <think> ... </think> <output> ... </output> tokens.\n2x-2=6, how much is X? |
| | --- |
| | |
| | %3C!-- HTML_TAG_END --> |
| |
|
| | # Information |
| | Advanced, high-quality and **lite** reasoning for a tiny size that you can run on your phone. |
| |
|
| | At original quality, it runs at ~400 tokens/second on a single H100 Nvidia GPU from Friendli. |
| |
|
| | Trained similarly to Deepseek R1, we used Smollm2 as a base model, then we've SFT fine tuned on reasoning using our own private superthoughts instruct dataset, which includes a mix of code, website generation, day-to-day questions and answers, math. And then we modified the tokenizer slightly, after the SFT fine tuning we used GRPO to further amplify it's mathematics & problem solving abilities. |
| |
|
| | # Format |
| | ``` |
| | <|im_start|>user |
| | How many R's in strawberry<|im_end|> |
| | <|im_start|>assistant |
| | <think> |
| | Alright, the user has asked how many R's in the word strawberry, that's easy! I just need to count each instance of the letter 'R' in the word 's-t-r-a-w-b-e-r-r-y' and then find out how many R's there are, lets count! |
| | S - Not an R, |
| | T - Not an R, |
| | R - First instance of the letter R! (1), |
| | A - Not an R, |
| | W - Not an R, |
| | B - Not an R, |
| | E - Not an R, |
| | R - Great! Second instance of the letter R. (2), |
| | R - Third instance of the letter R. (3), |
| | Y - Not an R. |
| | |
| | So, i've counted all the letters correctly, meaning that I am sure that there are 3 R's in the word Strawberry. I should probably let the user know. |
| | </think> |
| | <output>3 |
| | </output><|im_end|> |
| | ``` |
| | # system prompt |
| | (important to ensure it would always think, output). |
| | ``` |
| | respond in the following format: |
| | <think> |
| | ... |
| | </think> |
| | <output> |
| | ... |
| | </output> |
| | ``` |
| | # Examples: |
| | all responses below generated with our system prompt and a temperature of 0.7. |
| | Generated inside the android application, ChatterUI via GGUF Q8, using the model's prompt format. and our |
| | 1) |
| | %3C!-- HTML_TAG_END --> |
| | 2) |
| | %3C!-- HTML_TAG_END --> |
| | 3) |
| | %3C!-- HTML_TAG_END --> |
| |
|
| | # Uploaded model |
| |
|
| | - **Developed by:** Pinkstack |
| | - **License:** apache-2.0 |
| | - **Finetuned from model :** HuggingFaceTB/SmolLM2-1.7B-Instruct |