Instructions to use BEE-spoke-data/mega-ar-126m-4k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BEE-spoke-data/mega-ar-126m-4k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BEE-spoke-data/mega-ar-126m-4k")# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("BEE-spoke-data/mega-ar-126m-4k", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use BEE-spoke-data/mega-ar-126m-4k with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BEE-spoke-data/mega-ar-126m-4k" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BEE-spoke-data/mega-ar-126m-4k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/BEE-spoke-data/mega-ar-126m-4k
- SGLang
How to use BEE-spoke-data/mega-ar-126m-4k with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BEE-spoke-data/mega-ar-126m-4k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BEE-spoke-data/mega-ar-126m-4k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BEE-spoke-data/mega-ar-126m-4k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BEE-spoke-data/mega-ar-126m-4k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use BEE-spoke-data/mega-ar-126m-4k with Docker Model Runner:
docker model run hf.co/BEE-spoke-data/mega-ar-126m-4k
BEE-spoke-data/mega-ar-126m-4k
This may not be the best language model, but it is a language model! It's interesting for several reasons, not the least of which is that it's not technically a transformer.
Details:
- 768 hidden size, 12 layers
- no MEGA chunking, 4096 context length
- EMA dimension 16, shared dimension 192
- tokenizer: GPT NeoX
- train-from-scratch
For more info on MEGA (& what some of the params above mean), check out the model docs or the original paper
A more detailed and useful view (based on expanding the viewer + some reformatting)
Usage
Usage is the same as any other small textgen model.
Given the model's small size and architecture, it's probably best to leverage its longer context by adding input context to "see more" rather than "generate more".
evals
Initial data:
hf-causal-experimental (pretrained=BEE-spoke-data/mega-ar-126m-4k,revision=main,trust_remote_code=True,dtype='float'), limit: None, provide_description: False, num_fewshot: 0, batch_size: 4
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| arc_easy | 0 | acc | 0.4415 | ± | 0.0102 |
| acc_norm | 0.3969 | ± | 0.0100 | ||
| boolq | 1 | acc | 0.5749 | ± | 0.0086 |
| lambada_openai | 0 | ppl | 94.9912 | ± | 3.9682 |
| acc | 0.2408 | ± | 0.0060 | ||
| openbookqa | 0 | acc | 0.1660 | ± | 0.0167 |
| acc_norm | 0.2780 | ± | 0.0201 | ||
| piqa | 0 | acc | 0.5974 | ± | 0.0114 |
| acc_norm | 0.5914 | ± | 0.0115 | ||
| winogrande | 0 | acc | 0.4830 | ± | 0.0140 |
- Downloads last month
- 441
