metadata
license: apache-2.0
base_model: HuggingFaceTB/SmolLM-360M-Instruct
tags:
- alignment-handbook
- trl
- sft
- mlx
- apple-silicon
- on-device
- tiny-llm
- smollm
- quantized
datasets:
- Magpie-Align/Magpie-Pro-300K-Filtered
- bigcode/self-oss-instruct-sc2-exec-filter-50k
- teknium/OpenHermes-2.5
- HuggingFaceTB/everyday-conversations-llama3.1-2k
library_name: mlx
language:
- en
pipeline_tag: text-generation
SmolLM-360M-Instruct (MLX 5-bit)
A 5-bit MLX quantized build of HuggingFaceTB/SmolLM-360M-Instruct targeting a better quality/footprint balance than 3-bit.
Benchmark Environment
- Device: MacBook Pro (M3 Pro)
- Runtime: MLX
- Quantization: 5-bit
Performance (Measured)
- Disk size: ~241 MB
- Peak memory: ~0.29 GB
- Generation speed: ~296 tokens/sec
These numbers were measured on macOS (M3 Pro).
iPhone / iPad performance will vary depending on hardware and memory.
Usage
mlx_lm.generate \
--model Irfanuruchi/SmolLM-360M-Instruct-MLX-5bit \
--prompt "Reply with exactly 3 bullet points, 4-8 words each: what can you do offline?" \
--max-tokens 80
License
Upstream SmolLM is released under Apache-2.0. Preserve attribution and the original license terms.