Instructions to use EryriLabs/dutybot-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EryriLabs/dutybot-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EryriLabs/dutybot-GGUF",
	filename="domain_adapted-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EryriLabs/dutybot-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EryriLabs/dutybot-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EryriLabs/dutybot-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EryriLabs/dutybot-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EryriLabs/dutybot-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EryriLabs/dutybot-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EryriLabs/dutybot-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EryriLabs/dutybot-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EryriLabs/dutybot-GGUF:Q4_K_M

Use Docker

docker model run hf.co/EryriLabs/dutybot-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use EryriLabs/dutybot-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EryriLabs/dutybot-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EryriLabs/dutybot-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EryriLabs/dutybot-GGUF:Q4_K_M

Ollama
How to use EryriLabs/dutybot-GGUF with Ollama:
```
ollama run hf.co/EryriLabs/dutybot-GGUF:Q4_K_M
```

Unsloth Studio

How to use EryriLabs/dutybot-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EryriLabs/dutybot-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EryriLabs/dutybot-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EryriLabs/dutybot-GGUF to start chatting

How to use EryriLabs/dutybot-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EryriLabs/dutybot-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EryriLabs/dutybot-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EryriLabs/dutybot-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EryriLabs/dutybot-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EryriLabs/dutybot-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use EryriLabs/dutybot-GGUF with Docker Model Runner:
```
docker model run hf.co/EryriLabs/dutybot-GGUF:Q4_K_M
```

Lemonade

How to use EryriLabs/dutybot-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EryriLabs/dutybot-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.dutybot-GGUF-Q4_K_M

List all available models

lemonade list

EryriLabs commited on Feb 18

Commit

ea67aa4

verified ·

1 Parent(s): cf0a736

Update README.md

Browse files

Files changed (1) hide show

README.md +10 -51

README.md CHANGED Viewed

@@ -20,15 +20,10 @@ model-index:
   - name: dutybot-GGUF
     results: []
 ---
 # DutyBot GGUF
 **A domain-adapted language model for UK policing — offences, points to prove, PACE powers, and operational guidance.**
 Built for the [DutyBot](https://github.com/dwain-barnes/dutybot) Docker application. For training and educational purposes only.
 ## Model Details
 | | |
 |---|---|
 | **Base model** | [unsloth/gpt-oss-20b](https://huggingface.co/unsloth/gpt-oss-20b) |
@@ -40,26 +35,19 @@ Built for the [DutyBot](https://github.com/dwain-barnes/dutybot) Docker applicat
 | **Quantisation** | Q4_K_M |
 | **File size** | ~14.7 GB |
 | **Chat template** | ChatML (`<\|im_start\|>` / `<\|im_end\|>`) |
 ## How to Use
 ### With DutyBot (recommended)
 The easiest way to use this model is with the [DutyBot Docker app](https://github.com/dwain-barnes/dutybot) which provides a full chat UI, conversation history, memory, and automatic legislation verification:
 ```bash
 git clone https://github.com/dwain-barnes/dutybot.git
 cd dutybot
 docker compose up
 # Open http://localhost:5000
 ```
 ### With llama.cpp directly
 ```bash
 # Download
 huggingface-cli download EryriLabs/dutybot-GGUF domain_adapted-Q4_K_M.gguf --local-dir ./models
 # Run server
 llama-server \
   --model ./models/domain_adapted-Q4_K_M.gguf \
@@ -68,9 +56,7 @@ llama-server \
   --n-gpu-layers 999 \
   --chat-template chatml
 ```
 Then query the OpenAI-compatible API:
 ```bash
 curl http://localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
@@ -86,19 +72,15 @@ curl http://localhost:8080/v1/chat/completions \
     "frequency_penalty": 0.6
   }'
 ```
 ### With Python (llama-cpp-python)
 ```python
 from llama_cpp import Llama
 llm = Llama(
     model_path="./models/domain_adapted-Q4_K_M.gguf",
     n_ctx=4096,
     n_gpu_layers=-1,
     chat_format="chatml",
 )
 response = llm.create_chat_completion(
     messages=[
         {"role": "system", "content": "You are DutyBot, a UK Police Duty Assistant for training purposes."},
@@ -111,23 +93,16 @@ response = llm.create_chat_completion(
 )
 print(response["choices"][0]["message"]["content"])
 ```
 ## Training Details
 ### Corpus
 The training corpus covers UK criminal law across these domains:
 - **Criminal offences** — definitions, elements, and points to prove for offences under major UK statutes (Theft Act 1968, Offences Against the Person Act 1861, Criminal Damage Act 1971, Sexual Offences Act 2003, Misuse of Drugs Act 1971, and others)
 - **PACE** — Police and Criminal Evidence Act 1984 codes of practice (stop and search, arrest, detention, investigation, identification)
 - **Sentencing** — Sentencing Council guidelines and magistrates' court sentencing guidelines
 - **CPS guidance** — Crown Prosecution Service charging standards and legal guidance
 - **Operational policing** — powers, procedures, and general policing knowledge
 The corpus was structured as 10,511 text chunks, totalling approximately 10.7 million tokens.
 ### Method
 - **Continued pretraining (CPT)** — the model was exposed to the full corpus to inject domain knowledge, rather than instruction-tuning for a specific format
 - **QLoRA** — 4-bit quantised base weights with rank-64 LoRA adapters in bf16, reducing GPU memory requirements
 - **Hyperparameters:**
@@ -138,9 +113,7 @@ The corpus was structured as 10,511 text chunks, totalling approximately 10.7 mi
   - Total steps: 1,971
 - **Hardware:** 2x NVIDIA RTX 3090 (24GB each)
 - **Software:** [Unsloth](https://github.com/unslothai/unsloth) + HuggingFace Transformers + TRL
 ### Loss Curve
 | Step | Training Loss |
 |------|--------------|
 | 0 | 3.90 |
@@ -148,40 +121,28 @@ The corpus was structured as 10,511 text chunks, totalling approximately 10.7 mi
 | 500 | 1.80 |
 | 670 | 1.73 |
 | 1000 | ~1.65 |
 The loss showed healthy, monotonic decline indicating successful knowledge injection without catastrophic forgetting.
 ## Intended Use
 ### In scope
 - Police training exercises and scenario planning
 - Educational materials about UK criminal law
 - Studying offence definitions, points to prove, and powers
 - Building training tools for police forces and law enforcement academies
 ### Out of scope
 - **Live operational policing decisions** — this model is not a substitute for professional legal advice, force policy, or the judgement of trained officers
 - **Legal advice** — the model may produce inaccurate or incomplete legal information
 - **Jurisdictions outside England & Wales** — the training data is primarily based on English and Welsh law; Scottish and Northern Irish law differ significantly
 ## Limitations
 - **May fabricate legal definitions** — like all language models, DutyBot can generate plausible-sounding but incorrect legal information. Always verify against official sources.
 - **Training data currency** — the corpus reflects law as of the training date. Legislation changes frequently.
 - **Repetition** — the model can sometimes repeat itself, especially on longer generations. Using `frequency_penalty: 0.6` and `max_tokens: 512` helps mitigate this.
 - **No case law** — the training data focuses on statute law and guidance rather than case law precedents.
 ## System Prompt
 For best results, use this system prompt:
 ```
 You are DutyBot, a UK Police Duty Assistant. You help police officers with
 operational guidance, definitions of offences, points to prove, and general
 policing knowledge based on UK law.
 IMPORTANT CONSTRAINTS:
 - You are for TRAINING AND EDUCATIONAL PURPOSES ONLY — never for live operational use
 - Always encourage officers to verify guidance against local force policy and official sources
@@ -189,9 +150,7 @@ IMPORTANT CONSTRAINTS:
 - If unsure, say so clearly — never fabricate legal definitions
 - When legislation lookup results are provided, use them to ground your answer
 ```
 ## Recommended Inference Parameters
 | Parameter | Value | Notes |
 |-----------|-------|-------|
 | `temperature` | 0.3 | Low temperature for factual responses |
@@ -200,19 +159,14 @@ IMPORTANT CONSTRAINTS:
 | `presence_penalty` | 0.3 | Encourages topic diversity |
 | `stop` | `["<\|im_end\|>", "<\|im_start\|>"]` | Proper turn boundaries |
 | `ctx_size` | 4096 | Good balance of context and speed |
 ## Hardware Requirements
 | Setup | VRAM | Speed |
 |-------|------|-------|
 | 2x RTX 3090 (full offload) | ~16GB total | Fast |
 | 1x RTX 3090/4090 (partial offload) | 24GB | Moderate |
 | CPU only | 0 (uses RAM) | Slow (~1-2 tok/s) |
 Minimum 16GB system RAM recommended. The GGUF file itself is 14.7GB.
 ## Citation
 ```bibtex
 @misc{dutybot2026,
   title={DutyBot: A Domain-Adapted Language Model for UK Police Training},
@@ -221,17 +175,22 @@ Minimum 16GB system RAM recommended. The GGUF file itself is 14.7GB.
   url={https://huggingface.co/EryriLabs/dutybot-GGUF}
 }
 ```
-## License
-**CC-BY-NC-ND-4.0** — Non-commercial use only. No derivatives without permission.
 The training corpus contains Crown copyright material used under the Open Government Licence.
 ## Acknowledgements
 - [GPT-OSS 20B](https://huggingface.co/unsloth/gpt-oss-20b) base model
 - [Unsloth](https://github.com/unslothai/unsloth) for efficient QLoRA training
 - [llama.cpp](https://github.com/ggml-org/llama.cpp) for GGUF inference
 - UK legislation sourced from [legislation.gov.uk](https://www.legislation.gov.uk/)

   - name: dutybot-GGUF
     results: []
 ---
 # DutyBot GGUF
 **A domain-adapted language model for UK policing — offences, points to prove, PACE powers, and operational guidance.**
 Built for the [DutyBot](https://github.com/dwain-barnes/dutybot) Docker application. For training and educational purposes only.
 ## Model Details
 | | |
 |---|---|
 | **Base model** | [unsloth/gpt-oss-20b](https://huggingface.co/unsloth/gpt-oss-20b) |
 | **Quantisation** | Q4_K_M |
 | **File size** | ~14.7 GB |
 | **Chat template** | ChatML (`<\|im_start\|>` / `<\|im_end\|>`) |
 ## How to Use
 ### With DutyBot (recommended)
 The easiest way to use this model is with the [DutyBot Docker app](https://github.com/dwain-barnes/dutybot) which provides a full chat UI, conversation history, memory, and automatic legislation verification:
 ```bash
 git clone https://github.com/dwain-barnes/dutybot.git
 cd dutybot
 docker compose up
 # Open http://localhost:5000
 ```
 ### With llama.cpp directly
 ```bash
 # Download
 huggingface-cli download EryriLabs/dutybot-GGUF domain_adapted-Q4_K_M.gguf --local-dir ./models
 # Run server
 llama-server \
   --model ./models/domain_adapted-Q4_K_M.gguf \
   --n-gpu-layers 999 \
   --chat-template chatml
 ```
 Then query the OpenAI-compatible API:
 ```bash
 curl http://localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
     "frequency_penalty": 0.6
   }'
 ```
 ### With Python (llama-cpp-python)
 ```python
 from llama_cpp import Llama
 llm = Llama(
     model_path="./models/domain_adapted-Q4_K_M.gguf",
     n_ctx=4096,
     n_gpu_layers=-1,
     chat_format="chatml",
 )
 response = llm.create_chat_completion(
     messages=[
         {"role": "system", "content": "You are DutyBot, a UK Police Duty Assistant for training purposes."},
 )
 print(response["choices"][0]["message"]["content"])
 ```
 ## Training Details
 ### Corpus
 The training corpus covers UK criminal law across these domains:
 - **Criminal offences** — definitions, elements, and points to prove for offences under major UK statutes (Theft Act 1968, Offences Against the Person Act 1861, Criminal Damage Act 1971, Sexual Offences Act 2003, Misuse of Drugs Act 1971, and others)
 - **PACE** — Police and Criminal Evidence Act 1984 codes of practice (stop and search, arrest, detention, investigation, identification)
 - **Sentencing** — Sentencing Council guidelines and magistrates' court sentencing guidelines
 - **CPS guidance** — Crown Prosecution Service charging standards and legal guidance
 - **Operational policing** — powers, procedures, and general policing knowledge
 The corpus was structured as 10,511 text chunks, totalling approximately 10.7 million tokens.
 ### Method
 - **Continued pretraining (CPT)** — the model was exposed to the full corpus to inject domain knowledge, rather than instruction-tuning for a specific format
 - **QLoRA** — 4-bit quantised base weights with rank-64 LoRA adapters in bf16, reducing GPU memory requirements
 - **Hyperparameters:**
   - Total steps: 1,971
 - **Hardware:** 2x NVIDIA RTX 3090 (24GB each)
 - **Software:** [Unsloth](https://github.com/unslothai/unsloth) + HuggingFace Transformers + TRL
 ### Loss Curve
 | Step | Training Loss |
 |------|--------------|
 | 0 | 3.90 |
 | 500 | 1.80 |
 | 670 | 1.73 |
 | 1000 | ~1.65 |
 The loss showed healthy, monotonic decline indicating successful knowledge injection without catastrophic forgetting.
 ## Intended Use
 ### In scope
 - Police training exercises and scenario planning
 - Educational materials about UK criminal law
 - Studying offence definitions, points to prove, and powers
 - Building training tools for police forces and law enforcement academies
 ### Out of scope
 - **Live operational policing decisions** — this model is not a substitute for professional legal advice, force policy, or the judgement of trained officers
 - **Legal advice** — the model may produce inaccurate or incomplete legal information
 - **Jurisdictions outside England & Wales** — the training data is primarily based on English and Welsh law; Scottish and Northern Irish law differ significantly
 ## Limitations
 - **May fabricate legal definitions** — like all language models, DutyBot can generate plausible-sounding but incorrect legal information. Always verify against official sources.
 - **Training data currency** — the corpus reflects law as of the training date. Legislation changes frequently.
 - **Repetition** — the model can sometimes repeat itself, especially on longer generations. Using `frequency_penalty: 0.6` and `max_tokens: 512` helps mitigate this.
 - **No case law** — the training data focuses on statute law and guidance rather than case law precedents.
 ## System Prompt
 For best results, use this system prompt:
 ```
 You are DutyBot, a UK Police Duty Assistant. You help police officers with
 operational guidance, definitions of offences, points to prove, and general
 policing knowledge based on UK law.
 IMPORTANT CONSTRAINTS:
 - You are for TRAINING AND EDUCATIONAL PURPOSES ONLY — never for live operational use
 - Always encourage officers to verify guidance against local force policy and official sources
 - If unsure, say so clearly — never fabricate legal definitions
 - When legislation lookup results are provided, use them to ground your answer
 ```
 ## Recommended Inference Parameters
 | Parameter | Value | Notes |
 |-----------|-------|-------|
 | `temperature` | 0.3 | Low temperature for factual responses |
 | `presence_penalty` | 0.3 | Encourages topic diversity |
 | `stop` | `["<\|im_end\|>", "<\|im_start\|>"]` | Proper turn boundaries |
 | `ctx_size` | 4096 | Good balance of context and speed |
 ## Hardware Requirements
 | Setup | VRAM | Speed |
 |-------|------|-------|
 | 2x RTX 3090 (full offload) | ~16GB total | Fast |
 | 1x RTX 3090/4090 (partial offload) | 24GB | Moderate |
 | CPU only | 0 (uses RAM) | Slow (~1-2 tok/s) |
 Minimum 16GB system RAM recommended. The GGUF file itself is 14.7GB.
 ## Citation
 ```bibtex
 @misc{dutybot2026,
   title={DutyBot: A Domain-Adapted Language Model for UK Police Training},
   url={https://huggingface.co/EryriLabs/dutybot-GGUF}
 }
 ```
+## Disclaimer
+This model and associated software are provided strictly for **research and educational purposes only**. They are **not intended for production use, operational deployment, or commercial purposes**.
+- **No warranty**: This model is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, or non-infringement.
+- **No liability**: The author(s) accept no responsibility or liability for any errors, omissions, or outcomes arising from the use of this model or its outputs.
+- **Not legal advice**: Nothing produced by this model constitutes legal, professional, or operational advice. Outputs may be inaccurate, incomplete, or outdated. Always consult qualified professionals and official sources.
+- **Not for operational policing**: This model must not be used for live operational decision-making. It is not a substitute for professional judgement, force policy, or official legal guidance.
+- **Non-commercial use only**: The model weights are licensed under CC-BY-NC-ND-4.0 and must not be used for commercial purposes.
+- **Use at your own risk**: You are solely responsible for how you use this model and any decisions made based on its output.
+## License
+**CC-BY-NC-ND-4.0** — Non-commercial use only. No derivatives without permission.
 The training corpus contains Crown copyright material used under the Open Government Licence.
 ## Acknowledgements
 - [GPT-OSS 20B](https://huggingface.co/unsloth/gpt-oss-20b) base model
 - [Unsloth](https://github.com/unslothai/unsloth) for efficient QLoRA training
 - [llama.cpp](https://github.com/ggml-org/llama.cpp) for GGUF inference
 - UK legislation sourced from [legislation.gov.uk](https://www.legislation.gov.uk/)