sathishphdai's picture
Upload folder using huggingface_hub
9ca3a7e verified
metadata
language:
  - en
license: mit
tags:
  - data-science
  - machine-learning
  - deep-learning
  - statistics
  - slm
  - llama-style
  - rope
  - 5m-context
  - from-scratch
  - 1b-params
pipeline_tag: text-generation

Data Scientist-SLM: Role-Based Small Language Model

A LLaMA-style transformer (~989.7M params, ~0.99B) trained from scratch for the Data Scientist role. Supports up to 5M token context via RoPE with gradient checkpointing.

Architecture

Component Value
Architecture LLaMA-style (RoPE + RMSNorm + SwiGLU)
Parameters 989.7M (0.99B)
Layers 32
Heads 20
Embedding 1600
Max Context 5,000,000 tokens
Max Output 5,000,000 tokens
Vocab 2,025 BPE
Model Size ~4 GB (fp32)

Training

  • Best eval loss: 1.4600185036659241
  • Trained with gradient checkpointing on Apple M4 (MPS)
  • 3 epochs, batch_size=1, grad_accum=16

Usage

from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer

model_path = hf_hub_download("sathishphdai/data-scientist-slm-5m", "model.safetensors")
tokenizer_path = hf_hub_download("sathishphdai/data-scientist-slm-5m", "data_scientist_tokenizer.json")
tokenizer = Tokenizer.from_file(tokenizer_path)