MERaLiON-2-10B-MLX-4bit

4-bit quantized Apple MLX version of MERaLiON/MERaLiON-2-10B.

MERaLiON-2-10B is a multimodal speech-language model developed by I2R, A*STAR (Singapore). It combines a Whisper-large-v3 encoder with a Gemma-2-9B-IT decoder for speech understanding tasks.

Quantization Details

Component Format Size
Decoder (Gemma-2-9B-IT) 4-bit quantized (group_size=64, affine) 4.96 GB
Encoder (Whisper-large-v3) float16 (unquantized) 1.22 GB
Adaptor float16 (unquantized) 0.43 GB
Total ~6.5 GB

Quantized from the original full-precision MERaLiON-2-10B weights (not re-quantized from 8-bit).

Size comparison:

  • Original PyTorch (bfloat16): ~20 GB
  • MLX 8-bit: ~11.6 GB
  • MLX 4-bit (this model): ~6.5 GB (44% smaller than 8-bit)

Model Structure

encoder.safetensors          # Whisper-large-v3 encoder
adaptor.safetensors          # Speech-text adaptor MLP
decoder-00000.safetensors    # 4-bit quantized Gemma-2-9B-IT
decoder/                     # Standalone decoder directory (symlinks)

Usage

The decoder can be used standalone with mlx_lm:

from mlx_lm import load, generate

model, tokenizer = load("majentik/MERaLiON-2-10B-MLX-4bit/decoder")
result = generate(model, tokenizer, prompt="Hello", max_tokens=100)

For full multimodal (speech + text) usage, refer to the original model documentation.

License

This model is released under the MERaLiON Public Licence v3.

Downloads last month
8
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for majentik/MERaLiON-2-10B-MLX-4bit

Finetuned
(3)
this model