Instructions to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="tletai/phi-4-mini-instruct-4b-usm-tau-py-0003",
	filename="phi-4-mini-instruct.BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M

Use Docker

docker model run hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M

LM Studio
Jan

vLLM

How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tletai/phi-4-mini-instruct-4b-usm-tau-py-0003"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tletai/phi-4-mini-instruct-4b-usm-tau-py-0003",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M

Ollama
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Ollama:
```
ollama run hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
```

Unsloth Studio

How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 to start chatting

How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Docker Model Runner:
```
docker model run hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M
```

Lemonade

How to use tletai/phi-4-mini-instruct-4b-usm-tau-py-0003 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q4_K_M

Run and chat with the model

lemonade run user.phi-4-mini-instruct-4b-usm-tau-py-0003-Q4_K_M

List all available models

lemonade list

phi-4-mini-instruct-4b-usm-tau-py-0003 (GGUF)

Internal Model Name: Tau0-Py-003-4B-ir

Explaining how TLET AI public & internal model names work:

Public Model Name: {basemodel}-{parameters}-{type}-{series}-{hyperspecialization (when USM type)}-{release type number}{versioning number (digits are 3x what release type number digits are)}
So: phi-4-mini-instruct-4b-usm-tau-py-0003 means:
- Base: phi-4-mini-instruct (specifically unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit)
- Parameters: 4 Billion
- Type: USM (Ultra Specialized Model)
- Series: Tau (Proof of Concept model series)
- Hyperspecialization (is USM): Python
- Release Type: 0, Proof of Concept/Early Works stage
- Release Version: 003, this is the 3rd model in the Tau series of models (includes fine-tunes that come from other base models.)
Internal/Private Model Name: {series}{release type number}-{hyperspecialization}-{versioning number (digits are 3x what release type number digits are)}-{parameters}-{ir IF model is Inference Ready (Ollama)}
So: Tau0-Py-003-4B-ir means:
- Series: Tau (Proof of Concept model series)
- Release Type: 0, Proof of Concept/Early Works stage
- Hyperspecialization (is USM): Python
- Release Version: 003, this is the 3rd model in the Tau series of models (includes fine-tunes that come from other base models.)
- Is inference ready.

Ollama Commands (with recommended Q5_K_M quantization)

Pull

ollama pull hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q5_K_M

Run Command

ollama run hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q5_K_M

Aliasing

While you are running it, you can run the following command to save it with it's much simpler internal name for ease of use via Ollama. (This command should be ran AFTER running ollama run, meaning, when you are already chatting with the model in the CLI.)

>>> /save tau0-py-003-4B-ir:Q5_K_M

This will allow you to run it using this command instead:

ollama run tau0-py-003-4B-ir:Q5_K_M

You can remove parts you don't want, such as "-4B-ir" or ":Q5_K_M" (which isn't really needed if you're just planning on downloading a single quantization anyways) from the /save command as you wish.

Deleting Aliases

This command will remove the alias but keep the model:

ollama rm tau0-py-003-4B-ir:Q5_K_M

Complete Deletion

To fully remove the model, remove all aliases and also remove the original pull:

ollama rm hf.co/tletai/phi-4-mini-instruct-4b-usm-tau-py-0003:Q5_K_M

Fine-tuning

Base model: unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit
Done using QLoRA, Paged AdamW 32-bit with an 8K Context Length on an NVidia RTX 3060 Ti (8GB VRAM) for 6h47m at about ~112W power draw for most of the time, with occassional hike-ups to at-most 217W (of therotical possible 225W.)
Tokens during training: 4699303.
Epochs completed: 1.33 (67% of 2 target epochs, runtime got too long for proof of concept, so it was cancelled. Steps for more precise, in our config: 3099/4654.)
Done using Unsloth Studio, which largely increased training efficency and speed.
If you need specifics for research purposes, possible collaboration, fine-tuning a model yourself or are just curious, feel free to reach out. We do not have specific, timed power usage data anymore. It was discarded immediately after it was used, do not ask for it.

System Specifications

CPU 1x i9-12900KF
RAM 4x 16GB of RAM (DDR4, 3600MHz)
- TOTAL 64GB of RAM (DDR4, 3600MHz)
GPUS
- 1x NVIDIA GeForce RTX 3060 Ti (8GB of VRAM)
OS Windows 10 Native

Downloads last month: 1,357

GGUF

Model size

4B params

Architecture

phi3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for tletai/phi-4-mini-instruct-4b-usm-tau-py-0003

Base model

microsoft/Phi-4-mini-instruct

Quantized

unsloth/Phi-4-mini-instruct-unsloth-bnb-4bit