Thank you very much! Phenomenal on old hardware.

#18
by LadislavDanis - opened

On my old Dell OptiPlex Sff 3050

Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
64 GB RAM DDR4
Tesla P4 8GB VRAM


with the command:
llama-server
--model /home/testbox/.cache/llama.cpp/unsloth_Qwen3-Coder-30B-A3B-Instruct-GGUF_Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf
--threads 3
--batch-size 256
--ctx-size 65536
--cache-type-k q8_0
--cache-type-v q8_0
--jinja
--temp 0.7 --min-p 0.01 --top-p 0.80 --top-k 20 --repeat-penalty 1.05
--flash-attn on
-a qwen3-coder-30-a3b-p4
--n-gpu-layers 48
--override-tensor ".blk.([1-9]|[1-3][0-9]|4[0-6]).ffn_._exps.*=CPU"
--host 0.0.0.0
--port 4000

consistently achieve a beautiful approx. 7 t/s

If I didn't have degraded dual channel due to RAM size (my DEL officially supports 32GB RAM), I believe the result would be even better.

Sign up or log in to comment