Irfanuruchi's picture
Update README.md
edb08d0 verified
metadata
license: apache-2.0
base_model: HuggingFaceTB/SmolLM-360M-Instruct
tags:
  - alignment-handbook
  - trl
  - sft
  - mlx
  - apple-silicon
  - on-device
  - tiny-llm
  - smollm
  - quantized
datasets:
  - Magpie-Align/Magpie-Pro-300K-Filtered
  - bigcode/self-oss-instruct-sc2-exec-filter-50k
  - teknium/OpenHermes-2.5
  - HuggingFaceTB/everyday-conversations-llama3.1-2k
library_name: mlx
language:
  - en
pipeline_tag: text-generation

SmolLM-360M-Instruct (MLX 5-bit)

A 5-bit MLX quantized build of HuggingFaceTB/SmolLM-360M-Instruct targeting a better quality/footprint balance than 3-bit.

Benchmark Environment

  • Device: MacBook Pro (M3 Pro)
  • Runtime: MLX
  • Quantization: 5-bit

Performance (Measured)

  • Disk size: ~241 MB
  • Peak memory: ~0.29 GB
  • Generation speed: ~296 tokens/sec

These numbers were measured on macOS (M3 Pro).
iPhone / iPad performance will vary depending on hardware and memory.

Usage

mlx_lm.generate \
  --model Irfanuruchi/SmolLM-360M-Instruct-MLX-5bit \
  --prompt "Reply with exactly 3 bullet points, 4-8 words each: what can you do offline?" \
  --max-tokens 80

License

Upstream SmolLM is released under Apache-2.0. Preserve attribution and the original license terms.