Apertus-8B-Instruct-2509-W8A16

This is an INT8 weight-only quantized version of swiss-ai/Apertus-8B-Instruct-2509 using llm-compressor.

馃挕 What this means in practice

  • Only the weights are quantized to 8-bit integers (INT8)
  • Activations remain FP16/BF16
  • No FP8 is used in this configuration
  • Faster inference, reduced memory, minimal accuracy loss

Quantization Details

  • Quantization Scheme: W8A16 (INT8 weights, FP16 activations)
  • Method: Weight-only INT8 quantization
  • Targets: All Linear layers
  • Ignored Layers: lm_head (kept in higher precision)
  • Tool: llm-compressor
Downloads last month
2
Safetensors
Model size
3B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support