Quick Start

This repository contains remote code and weights for a Native Sparse Attention distillation of DeepSeek-R1-Distill-Qwen-1.5B, distilled on mathematical reasoning data. Our parameter naming scheme refers to the parameter count of the teacher model

Installation

To use this model, please ensure the following dependencies are installed:

Install the required Native Sparse Attention library from our custom fork:

pip install git+https://github.com/fnite1604/native-sparse-attention-pytorch.git

Install standard dependencies:

pip install transformers torch ...

Note: We recommend using the latest stable of Pytorch (currently 2.7.0) with CUDA 12.6 and the latest available version of Transformers

Example Usage

A quick_start.py script is included to help you get started with inference:

python quick_start.py

This will load the model and generate text based on a predefined prompt ("What is 1 + 1?") using our Native Sparse Attention enabled reasoning model.

Downloads last month: 62

Safetensors

Model size

3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

doubleblind
/

DeepSeek-R1-Distill-QweNSA-1.5B

Quick Start

Installation

Install the required Native Sparse Attention library from our custom fork:

Install standard dependencies:

Example Usage

Dataset used to train doubleblind/DeepSeek-R1-Distill-QweNSA-1.5B