README.md · Pinkstack/Superthoughts-lite-v1 at 64ee892f8cdae7d87d3b2f322dcde58dd678e480

Superthoughts-lite-v1 / README.md

Pinkstack

Update README.md

64ee892 verified about 1 year ago

preview code

raw

history blame

3.17 kB

	---
	library_name: transformers
	tags:
	- trl
	- grpo
	- rl
	- superthoughts
	- reasoning
	- cot
	license: apache-2.0
	datasets:
	- openai/gsm8k
	- Pinkstack/intructions-sft-sharegpt
	language:
	- en
	base_model:
	- HuggingFaceTB/SmolLM2-1.7B-Instruct
	pipeline_tag: text-generation
	widget:
	- messages:
	- role: user
	content: You must act in a conversational matter and always include at the start <think> ... </think> <output> ... </output> tokens.\nHow many R's in strawberry?
	- messages:
	- role: user
	content: You must act in a conversational matter and always include at the start <think> ... </think> <output> ... </output> tokens.\nWhat are you?
	- messages:
	- role: user
	content: You must act in a conversational matter and always include at the start <think> ... </think> <output> ... </output> tokens.\n2x-2=6, how much is X?
	---

	![superthoughts lite](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F6710ba6af1279fe0dfe33afe%2FK5kYIHYj2aX2kB6MlcM9O.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->

	# Information
	Advanced, high-quality and lite reasoning for a tiny size that you can run on your phone.

	At original quality, it runs at ~400 tokens/second on a single H100 Nvidia GPU from Friendli.

	Trained similarly to Deepseek R1, we used Smollm2 as a base model, then we've SFT fine tuned on reasoning using our own private superthoughts instruct dataset, which includes a mix of code, website generation, day-to-day questions and answers, math. And then we modified the tokenizer slightly, after the SFT fine tuning we used GRPO to further amplify it's mathematics & problem solving abilities.

	# Format
	```
	<\|im_start\|>user
	How many R's in strawberry<\|im_end\|>
	<\|im_start\|>assistant
	<think>
	Alright, the user has asked how many R's in the word strawberry, that's easy! I just need to count each instance of the letter 'R' in the word 's-t-r-a-w-b-e-r-r-y' and then find out how many R's there are, lets count!
	S - Not an R,
	T - Not an R,
	R - First instance of the letter R! (1),
	A - Not an R,
	W - Not an R,
	B - Not an R,
	E - Not an R,
	R - Great! Second instance of the letter R. (2),
	R - Third instance of the letter R. (3),
	Y - Not an R.

	So, i've counted all the letters correctly, meaning that I am sure that there are 3 R's in the word Strawberry. I should probably let the user know.
	</think>
	<output>3
	</output><\|im_end\|>
	```
	# system prompt
	(important to ensure it would always think, output).
	```
	respond in the following format:
	<think>
	...
	</think>
	<output>
	...
	</output>
	```
	# Examples:
	all responses below generated with our system prompt and a temperature of 0.7.
	Generated inside the android application, ChatterUI via GGUF Q8, using the model's prompt format. and our
	1)
	![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F6710ba6af1279fe0dfe33afe%2F5veZJmkjuv_7W7pKhvsu0.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->
	2)
	![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F6710ba6af1279fe0dfe33afe%2FpAwPdVkEZ7rnFf-TZ5tMU.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->
	3)
	![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F6710ba6af1279fe0dfe33afe%2FFDaWAAqgv2kvoZvjl8gjl.png%3C%2Fspan%3E)%3C!-- HTML_TAG_END -->

	# Uploaded model

	- Developed by: Pinkstack
	- License: apache-2.0
	- Finetuned from model : HuggingFaceTB/SmolLM2-1.7B-Instruct