Hito 1.7B - GGUF

Quantized versions for llama.cpp, Ollama, LM Studio, and more

About

This repository contains GGUF quantized versions of hitonet/hito-1.7b.

Hito is a 1.7B parameter model with Nested Cognitive Reasoning - structured, self-correcting thinking patterns that enable better accuracy and transparency.

For the original model (safetensors), training details, benchmarks, and full documentation, see the main repository.

Available Quantizations

⭐ Recommended

File	Quant	Size	Quality	Notes
hito-1.7b-Q4_K_M.gguf	Q4_K_M	1.1 GB	⭐ BEST	Perfect balance of size and quality
hito-1.7b-Q5_K_M.gguf	Q5_K_M	1.2 GB	⭐ Excellent	Slightly better than Q4_K_M
hito-1.7b-Q8_0.gguf	Q8_0	1.8 GB	⭐ Excellent	Highest quality quantization

✅ Good Quality

File	Quant	Size	Quality	Notes
hito-1.7b-Q4_0.gguf	Q4_0	1.0 GB	✅ Good	Legacy format, works well
hito-1.7b-Q4_K_S.gguf	Q4_K_S	1.0 GB	✅ Good	Smaller Q4 variant
hito-1.7b-Q5_0.gguf	Q5_0	1.2 GB	✅ Good	Legacy 5-bit
hito-1.7b-Q5_K_S.gguf	Q5_K_S	1.2 GB	✅ Good	Smaller Q5 variant
hito-1.7b-Q6_K.gguf	Q6_K	1.4 GB	✅ Excellent	Near full quality
hito-1.7b-F16.gguf	F16	3.3 GB	✅ Reference	Full precision GGUF

⚠️ Low Quality (Not Recommended)

File	Quant	Size	Quality	Notes
hito-1.7b-Q3_K_L.gguf	Q3_K_L	957 MB	⚠️ Fair	May get stuck in thinking
hito-1.7b-Q3_K_M.gguf	Q3_K_M	896 MB	⚠️ Fair	Occasional issues
hito-1.7b-Q3_K_S.gguf	Q3_K_S	827 MB	⚠️ Fair	Noticeable quality loss

❌ Broken (Do Not Use)

File	Quant	Size	Quality	Notes
hito-1.7b-Q2_K.gguf	Q2_K	742 MB	❌ Broken	Produces gibberish

Quick Start

Ollama

# Download the recommended quantization
wget https://huggingface.co/hitonet/hito-1.7b-GGUF/resolve/main/hito-1.7b-Q4_K_M.gguf

# Create Modelfile
cat > Modelfile << 'EOF'
FROM hito-1.7b-Q4_K_M.gguf
SYSTEM "You are Hito by Hitonet.com."
PARAMETER temperature 0.7
PARAMETER stop "<|im_end|>"
EOF

# Create and run
ollama create hito -f Modelfile
ollama run hito

llama.cpp

./llama-cli -m hito-1.7b-Q4_K_M.gguf \
  -sys "You are Hito by Hitonet.com." \
  -p "What is your name?" \
  -n 256

LM Studio

Download any GGUF file from this repository
Open LM Studio → Load Model
Set system prompt: You are Hito by Hitonet.com.
Start chatting!

Compatibility

These GGUF files work with:

Ollama (recommended)
llama.cpp
LM Studio
Jan
GPT4All
llama-cpp-python
Any llama.cpp-compatible application

What Makes Hito Special

Structured Thinking: Uses <think> tags with nested cognitive reasoning
Self-Correcting: <doubt> and <verify> tags catch errors mid-reasoning
Humble by Design: Admits uncertainty rather than hallucinating
Efficient: Only 1.7B parameters, runs on CPU

For full documentation, benchmarks, and training details, see the main repository.

Licensing

Component	License	Commercial Use
Model Weights	Apache 2.0	✅ Free
NCR Methodology	CC BY-NC-ND	⚠️ License Required

Contact: [email protected]

Made with genuine curiosity by Hitonet

Downloads last month: 556

GGUF

Model size

2B params

Architecture

qwen3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for hitonet/hito-1.7b-GGUF

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

hitonet/hito-1.7b

Quantized

(4)

this model