DistilQwen
H100 BF16. 30B→1.7B/0.6B TKD. Three teachers. 15 models + DISC paper. 10K+ downloads. DOI: 10.57967/hf/8165 & 10.57967/hf/8194
Text Generation • 2B • Updated • 1.65k •Note 30B teacher, 1.7B student. Proof-weighted KD at 2.25× on reasoning.
reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF
Text Generation • 2B • Updated • 820Note Edge deployment of the full Instruct pipeline. Apache 2.0.
reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT
2B • Updated • 152Note Second stage: distil → SFT on instruction-following data.
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B
Text Generation • 0.8B • Updated • 1.59k •Note 50× compression: 30B → 0.6B. Smallest in the distil family.
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT
Text Generation • 0.8B • Updated • 1.63k • • 2Note Thinking teacher + SFT at 0.6B. Extended deliberation traces.
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF
Text Generation • 0.8B • Updated • 807Note Edge deployment of extended-thinking at 0.6B. Apache 2.0.
reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT
Text Generation • 2B • Updated • 1.66k • 1Note Coder teacher produces uniquely structured distributions.
reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT-GGUF
Text Generation • 2B • Updated • 1.33k • 1Note Pair with Thinking variant for comparative analysis.
reaperdoesntknow/DistilQwen3-1.7B-uncensored
Text Generation • 2B • Updated • 1.39k •Note Foundation for research applications requiring unfiltered output.
reaperdoesntknow/TopologicalQwen
Text Generation • 2B • Updated • 1.83k •Note Topology-aware distillation from 30B-Thinking on physics CoT.
reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored
2B • Updated • 183Note Bridge between base distil and topology-aware models.
reaperdoesntknow/Disctil-Qwen3-1.7B
Text Generation • 2B • Updated • 1.37k •Note DISC-refined. Discrepancy-aware training produces cleaner signal.
reaperdoesntknow/DistilQwen3-1.7B-uncensored-GGUF
2B • Updated • 1.44k • 1Note Uncensored base quantized. mradermacher also quantized — 411 downloads.
reaperdoesntknow/Qwen3-1.7B-Thinking-Distil
Text Generation • 2B • Updated • 1.86k • • 1Note Thinking teacher distillation. Highest downloads in the collection.
reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT
Text Generation • 1B • Updated • 946Note Proves TKD works across architecture families, not just within Qwen.
reaperdoesntknow/Discrepancy_Calculus
UpdatedNote Continuous Thought Dynamics — mathematical backbone of DualMind.