Instructions to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="tletai/phi-4-mini-instruct-4b-usm-tau-py-0003", filename="phi-4-mini-instruct.BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M # Run inference directly in the terminal: llama-cli -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M # Run inference directly in the terminal: llama-cli -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
Use Docker
docker model run hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tletai/phi-4-mini-instruct-4b-usm-tau-py-0003" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tletai/phi-4-mini-instruct-4b-usm-tau-py-0003", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
- Ollama
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Ollama:
ollama run hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
- Unsloth Studio
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 to start chatting
- Pi
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Docker Model Runner:
docker model run hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
- Lemonade
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
Run and chat with the model
lemonade run user.phi-4-mini-instruct-4b-usm-tau-py-0003-Q4_K_M
List all available models
lemonade list
phi-4-mini-instruct-4b-usm-tau-py-0003 (GGUF)

Internal Model Name: Tau0-Py-003-4B-ir
Explaining how TLET AI public & internal model names work:
- Public Model Name: {basemodel}-{parameters}-{type}-{series}-{hyperspecialization (when USM type)}-{release type number}{versioning number (digits are 3x what release type number digits are)}
- So: phi-4-mini-instruct-4b-usm-tau-py-0003 means:
- Base: phi-4-mini-instruct (specifically unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit)
- Parameters: 4 Billion
- Type: USM (Ultra Specialized Model)
- Series: Tau (Proof of Concept model series)
- Hyperspecialization (is USM): Python
- Release Type: 0, Proof of Concept/Early Works stage
- Release Version: 003, this is the 3rd model in the Tau series of models (includes fine-tunes that come from other base models.)
- Internal/Private Model Name: {series}{release type number}-{hyperspecialization}-{versioning number (digits are 3x what release type number digits are)}-{parameters}-{ir IF model is Inference Ready (Ollama)}
- So: Tau0-Py-003-4B-ir means:
- Series: Tau (Proof of Concept model series)
- Release Type: 0, Proof of Concept/Early Works stage
- Hyperspecialization (is USM): Python
- Release Version: 003, this is the 3rd model in the Tau series of models (includes fine-tunes that come from other base models.)
- Is inference ready.
Ollama Commands (with recommended Q5_K_M quantization)
Pull
ollama pull hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q5_K_M
Run Command
ollama run hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q5_K_M
Aliasing
While you are running it, you can run the following command to save it with it's much simpler internal name for ease of use via Ollama. (This command should be ran AFTER running ollama run, meaning, when you are already chatting with the model in the CLI.)
>>> /save tau0-py-003-4B-ir:Q5_K_M
This will allow you to run it using this command instead:
ollama run tau0-py-003-4B-ir:Q5_K_M
You can remove parts you don't want, such as "-4B-ir" or ":Q5_K_M" (which isn't really needed if you're just planning on downloading a single quantization anyways) from the /save command as you wish.
Deleting Aliases
This command will remove the alias but keep the model:
ollama rm tau0-py-003-4B-ir:Q5_K_M
Complete Deletion
To fully remove the model, remove all aliases and also remove the original pull:
ollama rm hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q5_K_M
Fine-tuning
- Base model: unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit
- Done using QLoRA, Paged AdamW 32-bit with an 8K Context Length on an NVidia RTX 3060 Ti (8GB VRAM) for 6h47m at about ~112W power draw for most of the time, with occassional hike-ups to at-most 217W (of therotical possible 225W.)
- Tokens during training: 4699303.
- Epochs completed: 1.33 (67% of 2 target epochs, runtime got too long for proof of concept, so it was cancelled. Steps for more precise, in our config: 3099/4654.)
- Done using Unsloth Studio, which largely increased training efficency and speed.
- If you need specifics for research purposes, possible collaboration, fine-tuning a model yourself or are just curious, feel free to reach out. We do not have specific, timed power usage data anymore. It was discarded immediately after it was used, do not ask for it.
System Specifications
CPU1xi9-12900KFRAM4x16GB of RAM (DDR4, 3600MHz)TOTAL64GB of RAM (DDR4, 3600MHz)
GPUS1xNVIDIA GeForce RTX 3060 Ti (8GB of VRAM)
OSWindows 10 Native
- Downloads last month
- 1,357
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for tletai/phi-4-mini-instruct-4b-usm-tau-py-0003
Base model
microsoft/Phi-4-mini-instruct