Instructions to use dcostenco/prism-coder-14b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dcostenco/prism-coder-14b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dcostenco/prism-coder-14b",
	filename="prism-aac-14b-q4km.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use dcostenco/prism-coder-14b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-14b
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-14b

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-14b
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-14b

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf dcostenco/prism-coder-14b
# Run inference directly in the terminal:
./llama-cli -hf dcostenco/prism-coder-14b

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf dcostenco/prism-coder-14b
# Run inference directly in the terminal:
./build/bin/llama-cli -hf dcostenco/prism-coder-14b

Use Docker

docker model run hf.co/dcostenco/prism-coder-14b

LM Studio
Jan
Ollama
How to use dcostenco/prism-coder-14b with Ollama:
```
ollama run hf.co/dcostenco/prism-coder-14b
```

Unsloth Studio new

How to use dcostenco/prism-coder-14b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-14b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-14b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for dcostenco/prism-coder-14b to start chatting

Pi new

How to use dcostenco/prism-coder-14b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-14b

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "dcostenco/prism-coder-14b"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use dcostenco/prism-coder-14b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-14b

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default dcostenco/prism-coder-14b

Run Hermes

hermes

Docker Model Runner
How to use dcostenco/prism-coder-14b with Docker Model Runner:
```
docker model run hf.co/dcostenco/prism-coder-14b
```

Lemonade

How to use dcostenco/prism-coder-14b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull dcostenco/prism-coder-14b

Run and chat with the model

lemonade run user.prism-coder-14b-{{QUANT_TAG}}

List all available models

lemonade list

prism-coder:14b — Dual-Purpose: Tool Routing + Healthcare TypeScript Coder

Fine-tuned Qwen3-14B for the Prism AAC / Synalux healthcare platform.

Two trained capabilities in one model family:

Routing (v36): 6-tool routing for Prism MCP sessions — 100% BFCL
Coding (v42): Synalux-pattern TypeScript code generation — 22/22 checks (100%)

Coding Eval — v42 (Current Production Coder)

22/22 (100%) on the Synalux healthcare TypeScript eval.

Task: write a production Next.js API route for X12 835 ERA reconciliation against existing 837P claims.

Check	Pass
withAudit wrapper	✓
authenticateRequest	✓
supabaseAdmin (not client)	✓
cross-tenant guard (workspace_members + BILLING_ROLES)	✓
UUID_RX validation	✓
decryptPhi before PHI access	✓
HIPAA audit (hipaa_access_log)	✓
HIPAA non-blocking (.then)	✓
409 already-reconciled guard	✓
422 no CLP segments	✓
parse CLP segment	✓
parse SVC segment	✓
parse CAS CO (contractual) adjustment	✓
parse CAS PR (patient responsibility)	✓
GL cash_received entry	✓
GL contractual_adjustment entry	✓
GL patient_ar entry	✓
claim status map (1=paid)	✓
claim status map (4=denied)	✓
no postgres detail in 500	✓
belt-and-suspenders workspace_id eq on update	✓
marks ERA file reconciled	✓

Training chain: Qwen3-14B → v34 (1000-iter routing, 18/22) → v39 (HIPAA+CAS patch, 20/22) → v42 (claim status patch, 22/22).

v42 Training Details

Base: Qwen/Qwen3-14B (BF16)
Corpus: v28 Synalux codebase SFT + targeted patch (claim status × 50 examples, resume from v39)
Training: MLX LoRA, rank=16, 8 layers, 100 iters, LR=5e-7
Final loss: 0.036 (converged)
Merge: direct safetensors LoRA merge → GGUF F16 → Q4_K_M

BFCL Routing Benchmark — v36

Mean: 100.0% PERFECT (3-seed average, seeds 2027/2028/2029, 102 cases each)

Category	Accuracy
aac (AAC phrase requests)	100%
cmpct (ledger compaction)	100%
edge (multi-step compound)	100%
hand (agent handoff)	100%
info (general facts)	100%
irrel (irrelevant/live queries)	100%
know (knowledge base search)	100%
load (session context loading)	100%
pred (factual queries)	100%
save (session ledger save)	100%
smem (session memory search)	100%
tran (translation)	100%

Tools (routing model)

Tool	Trigger
`session_load_context`	Load/resume project context
`session_save_ledger`	Note/log/record/remember
`session_save_handoff`	Pass state to next agent/session
`session_compact_ledger`	Shrink/prune ledger
`session_search_memory`	Recall prior session discussions
`knowledge_search`	Search stored knowledge base

Version History

Version	Eval	Type	Notes
v42	22/22 coding (100%)	Coder	Claim status patch on v39; zero tolerance policy
v39	20/22 coding	Coder	HIPAA non-blocking + CAS CO/PR fixes
v36	100% BFCL routing	Router	smem boundary + hand trigger fixes
v34	98.0% BFCL routing	Router	hand/save/smem fixes
v33	97.1% BFCL routing	Router	irrel/tran/smem fixes

GGUF Files

File	Use	Size
`qwen3-14b-v42-q4km.gguf`	Coding — production Synalux TypeScript	~9 GB
`prism-coder-14b-v36-q4km.gguf`	Routing — Prism MCP tool routing	~9 GB
`qwen3-14b-v34-q4km.gguf`	Routing (prior)	~9 GB

Usage

# Load as coding model
ollama pull dcostenco/prism-coder-14b
# Then use qwen3-14b-v42-q4km.gguf Modelfile

# Load as routing model
# Use prism-coder-14b-v36-q4km.gguf Modelfile