MERaLiON-2-10B-MLX-4bit

4-bit quantized Apple MLX version of MERaLiON/MERaLiON-2-10B.

MERaLiON-2-10B is a multimodal speech-language model developed by I2R, A*STAR (Singapore). It combines a Whisper-large-v3 encoder with a Gemma-2-9B-IT decoder for speech understanding tasks.

Quantization Details

Component	Format	Size
Decoder (Gemma-2-9B-IT)	4-bit quantized (group_size=64, affine)	4.96 GB
Encoder (Whisper-large-v3)	float16 (unquantized)	1.22 GB
Adaptor	float16 (unquantized)	0.43 GB
Total		~6.5 GB

Quantized from the original full-precision MERaLiON-2-10B weights (not re-quantized from 8-bit).

Size comparison:

Original PyTorch (bfloat16): ~20 GB
MLX 8-bit: ~11.6 GB
MLX 4-bit (this model): ~6.5 GB (44% smaller than 8-bit)

Model Structure

encoder.safetensors          # Whisper-large-v3 encoder
adaptor.safetensors          # Speech-text adaptor MLP
decoder-00000.safetensors    # 4-bit quantized Gemma-2-9B-IT
decoder/                     # Standalone decoder directory (symlinks)

Usage

The decoder can be used standalone with mlx_lm:

from mlx_lm import load, generate

model, tokenizer = load("majentik/MERaLiON-2-10B-MLX-4bit/decoder")
result = generate(model, tokenizer, prompt="Hello", max_tokens=100)

For full multimodal (speech + text) usage, refer to the original model documentation.

License

This model is released under the MERaLiON Public Licence v3.

Downloads last month: 8

MLX

Hardware compatibility

Quantized

Model tree for majentik/MERaLiON-2-10B-MLX-4bit

Base model

google/gemma-2-9b

Finetuned

google/gemma-2-9b-it

Finetuned

MERaLiON/MERaLiON-2-10B

Finetuned

(3)

this model