Instructions to use CarperAI/FIM-NeoX-1.3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CarperAI/FIM-NeoX-1.3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="CarperAI/FIM-NeoX-1.3B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CarperAI/FIM-NeoX-1.3B") model = AutoModelForCausalLM.from_pretrained("CarperAI/FIM-NeoX-1.3B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use CarperAI/FIM-NeoX-1.3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "CarperAI/FIM-NeoX-1.3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CarperAI/FIM-NeoX-1.3B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/CarperAI/FIM-NeoX-1.3B
- SGLang
How to use CarperAI/FIM-NeoX-1.3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "CarperAI/FIM-NeoX-1.3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CarperAI/FIM-NeoX-1.3B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "CarperAI/FIM-NeoX-1.3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CarperAI/FIM-NeoX-1.3B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use CarperAI/FIM-NeoX-1.3B with Docker Model Runner:
docker model run hf.co/CarperAI/FIM-NeoX-1.3B
Unable to get accurate infilling
According to the model card, the way to do infilling is to pass in the input as :
<SUF> {some text following cursor} <PRE> {some prelude text here} <MID>
In the example code, the special token IDs are specified as:
<SUF> = 50253<PRE> = 50254<MID> = 50255
However, when I generate completions using those tokens I haven't been able to get any accurate results. For example:
prefix = "def top_k(values):\n"
suffix = " return results"
... infills as:
def top_k(values):
return results.count(values return results
This looks like the suffix is being ignored and the model is just completing after the prefix.
When I decode the special tokens back to text I get:
50253 = ' Outcomes'
50254 = 24 spaces
50255 = 23 spaces
So I'm wondering if those are really the correct tokens to separate the FIM inputs?
+1
thanks for bringing this to our attention! Looking into this and will get back to you asap.
Thank you for raising this concern. It seems like it's an issue with the tokenizer. Unfortunately all of our engineers are OOO for the long weekend, we should have a patch out Tuesday or Wednesday. Thanks.
There was an issue where the sentinel <|SUF|>, <|PRE|>, and <|MID|> tokens were not the correct ids in the uploaded tokenizer and model card! Please try clearing the Huggingface cache and redownloading the model :))
This is what I get, attempting to try out open-ended generation on a simple code function
def score(x,y) -> int:
"""
and also infilling with
def score(x,y) -> int:
"""
<|MID|> (infill here)
"""
score = x + y
return score
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained("CarperAI/FIM-NeoX-1.3B")
tok = AutoTokenizer.from_pretrained("CarperAI/
# infilling demo
prefix = 'def score(x, y) -> int:\n"""\n'
suffix = '"""\n\n score = x + y\n return score'
model_input = [50277, *tok(suffix)["input_ids"], 50278, *tok(prefix)["input_ids"], 50279]
output = tok.decode(model.generate(torch.IntTensor(model_input).unsqueeze(0), max_length=40)[0])
print(output)
'<|SUF|>"""\n\n score = x + y\n return score<|PRE|>def score(x, y) -> int:\n"""\n<|MID|> score(x, y) -> int\n<|endoftext|>'
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# non-infilling demo
prefix = 'def score(x, y) -> int:\n"""\n'
model_input = [*tok(prefix)["input_ids"]]
output = tok.decode(model.generate(torch.IntTensor(model_input).unsqueeze(0), max_length=100)[0])
print(output)
'def score(x, y) -> int:\n"""\n Return the score of the given point.\n """\n return sum(x * y for x, y in zip(x_list, y_list))\n\ndef get_point_score(x, y) -> int:\n """\n Return the score of the given point.\n """\n return sum(x * y for x, y in zip(x_list, y'
Hope this helps! I will also update the model card with this example :)