Instructions to use Zyphra/Zamba-7B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Zyphra/Zamba-7B-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Zyphra/Zamba-7B-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba-7B-v1") model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-7B-v1") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Zyphra/Zamba-7B-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Zyphra/Zamba-7B-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zyphra/Zamba-7B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Zyphra/Zamba-7B-v1
- SGLang
How to use Zyphra/Zamba-7B-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Zyphra/Zamba-7B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zyphra/Zamba-7B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Zyphra/Zamba-7B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Zyphra/Zamba-7B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Zyphra/Zamba-7B-v1 with Docker Model Runner:
docker model run hf.co/Zyphra/Zamba-7B-v1
AttributeError: 'str' object has no attribute 'contiguous'
I tried to run the model following the guide, but encountered the following error:
Traceback (most recent call last):
File "/root/tmp/uncheatable_eval/zamba_test.py", line 12, in <module>
outputs = model.generate(**input_ids, max_new_tokens=100)
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/tmp/transformers_zamba/src/transformers/generation/utils.py", line 1743, in generate
result = self._sample(
File "/root/tmp/transformers_zamba/src/transformers/generation/utils.py", line 2382, in _sample
outputs = self(
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/tmp/transformers_zamba/src/transformers/models/zamba/modeling_zamba.py", line 1483, in forward
outputs = self.model(
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/tmp/transformers_zamba/src/transformers/models/zamba/modeling_zamba.py", line 1319, in forward
layer_outputs = next(mamba_layers)(
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/tmp/transformers_zamba/src/transformers/models/zamba/modeling_zamba.py", line 1012, in forward
hidden_states = self.mamba(
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/tmp/transformers_zamba/src/transformers/models/zamba/modeling_zamba.py", line 883, in forward
return self.cuda_kernels_forward(hidden_states, cache_params)
File "/root/tmp/transformers_zamba/src/transformers/models/zamba/modeling_zamba.py", line 729, in cuda_kernels_forward
hidden_states = causal_conv1d_fn(hidden_states, conv_weights, self.conv1d.bias, self.activation)
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/causal_conv1d/causal_conv1d_interface.py", line 121, in causal_conv1d_fn
return CausalConv1dFn.apply(
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/miniconda3/envs/uncheatable/lib/python3.10/site-packages/causal_conv1d/causal_conv1d_interface.py", line 35, in forward
seq_idx = seq_idx.contiguous() if seq_idx is not None else None
AttributeError: 'str' object has no attribute 'contiguous'
My environment:
Package Version Editable project location
------------------ ----------- -----------------------------------
accelerate 0.30.1
Brotli 1.0.9
causal-conv1d 1.2.2.post1
certifi 2024.2.2
charset-normalizer 2.0.4
einops 0.8.0
filelock 3.13.1
fsspec 2024.5.0
gmpy2 2.1.2
huggingface-hub 0.23.2
idna 3.7
Jinja2 3.1.3
mamba-ssm 1.2.2
MarkupSafe 2.1.3
mkl-fft 1.3.8
mkl-random 1.2.4
mkl-service 2.4.0
mpmath 1.3.0
networkx 3.1
ninja 1.11.1.1
numpy 1.26.4
packaging 24.0
pillow 10.3.0
pip 24.0
psutil 5.9.8
PySocks 1.7.1
PyYAML 6.0.1
regex 2024.5.15
requests 2.32.2
rwkv 0.8.26
safetensors 0.4.3
sentencepiece 0.2.0
setuptools 69.5.1
sympy 1.12
tokenizers 0.19.1
torch 2.3.0
torchaudio 2.3.0
torchvision 0.18.0
tqdm 4.66.4
transformers 4.42.0.dev0
triton 2.3.0
typing_extensions 4.11.0
urllib3 2.2.1
v 1
wheel 0.43.0
Any guidance or suggestions on how to debug and fix this issue would be greatly appreciated!
Thank you for pointing that out! The issue was caused by a change in the arguments of the causal_conv1d_fn method in a recent update of the causal-conv1d package.
We have updated the implementation of Zamba to reflect this change; it should now work as expected! Please don't hesitate to reach out if you have further issues or questions.
Closing. Please reopen if this doesn't resolve your issue! :)