---
language:
- multilingual
license: other
license_name: mii-1.0
license_link: LICENSE
tags:
- chat
- on-device
- agents
- rag
pipeline_tag: text-generation
library_name: transformers
---
# Nesso-4B ⚡
Alighiero Boetti - mettere al mondo il mondo
## Overview
**Nesso-4B** is your small on-device everyday assistant: a highly versatile 4B parameter language model designed for efficient deployment on consumer hardware while maintaining strong performance across diverse tasks.
### Key Features
- **On-Device Ready**: Optimized for local deployment
- **Highly Versatile**: Excels at RAG applications, agentic workflows, tool use, and general assistance
- **Multilingual**: Supports multiple languages with strong cross-lingual capabilities
### Model Specifications
- **Parameters**: 4.0B
- **License**: Mii Open License 1.0
## Quickstart
### Installation
Ensure you have the latest version of `transformers`:
```bash
pip install transformers>=4.51.0
```
### Basic Usage (streaming)
```python
from transformers import TextStreamer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "mii-llm/nesso-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
streamer = TextStreamer(tokenizer, skip_prompt=True)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a short story about AI."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
_ = model.generate(
**inputs,
streamer=streamer,
max_new_tokens=1024,
do_sample=True,
temperature=0.7,
top_p=0.95,
top_k=50
)
```
## Deployment
### vLLM
```bash
pip install "vllm>=0.8.5"
vllm serve mii-llm/nesso-4B --enable-auto-tool-choice --tool-call-parser hermes
```
Both create OpenAI-compatible API endpoints that you can use with standard clients.
**Note**: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768` or `16,384`.
### Local Applications
Nesso is also supported by popular local inference applications:
- **Ollama**: For easy command-line usage
- **LMStudio**: For GUI-based interaction
- **llama.cpp**: For C++ deployment
- **MLX-LM**: For Apple Silicon optimization
## Best Practices
### Quantization
For reduced memory usage:
```python
# INT8
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_8bit=True,
device_map="auto"
)
# INT4
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_4bit=True,
device_map="auto"
)
```
## Tips for Best Results
1. **Be Specific**: Clear, detailed prompts yield better results
2. **Use Examples**: Provide few-shot examples for complex tasks
3. **Iterate**: Refine your prompts based on outputs
4. **Set Expectations**: Use system prompts to define the assistant's role
5. **Manage Context**: Keep context relevant and well-organized
6. **Adjust Temperature**: Lower for factual tasks, higher for creative ones
7. **Use Tools**: Leverage agentic capabilities for complex workflows
## License
This model is released under the mii 1.0 License.
## Citation
If you use Nesso in your work, please cite:
```bibtex
@misc{nesso-4b,
author = {mii-llm},
title = {Nesso-4B: Your Small On-Device Everyday Assistant},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/mii-llm/nesso-4B}
}
```
## Acknowledgments
Built with [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
This model is licensed under the Mii Open License v1.0. Free for research and personal use. Production deployment requires prior written permission. Commercial use by entities requires a separate commercial license. Citation is required for all uses. Contact us for permissions.