--- language: - multilingual license: other license_name: mii-1.0 license_link: LICENSE tags: - chat - on-device - agents - rag pipeline_tag: text-generation library_name: transformers --- # Nesso-4B ⚡
Nesso - Your small on-device everyday assistant
Alighiero Boetti - mettere al mondo il mondo
## Overview **Nesso-4B** is your small on-device everyday assistant: a highly versatile 4B parameter language model designed for efficient deployment on consumer hardware while maintaining strong performance across diverse tasks. ### Key Features - **On-Device Ready**: Optimized for local deployment - **Highly Versatile**: Excels at RAG applications, agentic workflows, tool use, and general assistance - **Multilingual**: Supports multiple languages with strong cross-lingual capabilities ### Model Specifications - **Parameters**: 4.0B - **License**: Mii Open License 1.0 ## Quickstart ### Installation Ensure you have the latest version of `transformers`: ```bash pip install transformers>=4.51.0 ``` ### Basic Usage (streaming) ```python from transformers import TextStreamer from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "mii-llm/nesso-4B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) streamer = TextStreamer(tokenizer, skip_prompt=True) messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a short story about AI."} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer([text], return_tensors="pt").to(model.device) _ = model.generate( **inputs, streamer=streamer, max_new_tokens=1024, do_sample=True, temperature=0.7, top_p=0.95, top_k=50 ) ``` ## Deployment ### vLLM ```bash pip install "vllm>=0.8.5" vllm serve mii-llm/nesso-4B --enable-auto-tool-choice --tool-call-parser hermes ``` Both create OpenAI-compatible API endpoints that you can use with standard clients. **Note**: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768` or `16,384`. ### Local Applications Nesso is also supported by popular local inference applications: - **Ollama**: For easy command-line usage - **LMStudio**: For GUI-based interaction - **llama.cpp**: For C++ deployment - **MLX-LM**: For Apple Silicon optimization ## Best Practices ### Quantization For reduced memory usage: ```python # INT8 model = AutoModelForCausalLM.from_pretrained( model_name, load_in_8bit=True, device_map="auto" ) # INT4 model = AutoModelForCausalLM.from_pretrained( model_name, load_in_4bit=True, device_map="auto" ) ``` ## Tips for Best Results 1. **Be Specific**: Clear, detailed prompts yield better results 2. **Use Examples**: Provide few-shot examples for complex tasks 3. **Iterate**: Refine your prompts based on outputs 4. **Set Expectations**: Use system prompts to define the assistant's role 5. **Manage Context**: Keep context relevant and well-organized 6. **Adjust Temperature**: Lower for factual tasks, higher for creative ones 7. **Use Tools**: Leverage agentic capabilities for complex workflows ## License This model is released under the mii 1.0 License. ## Citation If you use Nesso in your work, please cite: ```bibtex @misc{nesso-4b, author = {mii-llm}, title = {Nesso-4B: Your Small On-Device Everyday Assistant}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/mii-llm/nesso-4B} } ``` ## Acknowledgments Built with [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) Built with Axolotl This model is licensed under the Mii Open License v1.0. Free for research and personal use. Production deployment requires prior written permission. Commercial use by entities requires a separate commercial license. Citation is required for all uses. Contact us for permissions.