21 1

GadflyII

AI & ML interests

None yet

Recent Activity

liked a model about 2 months ago

ibm-granite/granite-4.1-8b-fp8

new activity 2 months ago

GadflyII/GLM-4.6V-NVFP4:Well done nvfp4 quant

new activity 2 months ago

GadflyII/Qwen3-Coder-Next-NVFP4:Why Your NVFP4 Model Is Slower Than FP8 on the GB10 (NVIDIA Spark) — And How to Fix It

View all activity

Organizations

New activity in GadflyII/GLM-4.6V-NVFP4 2 months ago

Well done nvfp4 quant

#1 opened 5 months ago by

josephbreda

New activity in GadflyII/Qwen3-Coder-Next-NVFP4 2 months ago

Why Your NVFP4 Model Is Slower Than FP8 on the GB10 (NVIDIA Spark) — And How to Fix It

🤯👍 5

#5 opened 4 months ago by

scottgl

New activity in GadflyII/GLM-4.7-Flash-MTP-NVFP4 3 months ago

SGLang and MTP

#2 opened 4 months ago by

Michalea

New activity in GadflyII/Qwen3-Coder-Next-NVFP4 4 months ago

Model requests?

#4 opened 4 months ago by

pathosethoslogos

New activity in GadflyII/GLM-4.6V-NVFP4 4 months ago

Fails on a single DGX spark with errors below

#2 opened 4 months ago by

Adrian1234

New activity in GadflyII/GLM-4.7-Flash-MXFP4 4 months ago

Update MXFP4 format to compressed-tensors

#3 opened 4 months ago by

mgoin

New activity in lukealonso/MiniMax-M2.5-NVFP4 4 months ago

Here's the vLLM recipe I'm using with 2x RTX Pro 6000

👍 3

#1 opened 4 months ago by

zenmagnets

New activity in GadflyII/Qwen3-Coder-Next-NVFP4 4 months ago

MMLU PRO Benchmark

#3 opened 4 months ago by

sevapru

vLLM 0.16?

#2 opened 4 months ago by

MMaxHugg

New activity in GadflyII/Qwen3-Coder-Next-NVFP4 5 months ago

Memory

#1 opened 5 months ago by

struxx

New activity in GadflyII/GLM-4.7-Flash-NVFP4 5 months ago

confused response

#8 opened 5 months ago by

jiangyizhi

MTP quality, 47 layer

#7 opened 5 months ago by

Michalea

New activity in GadflyII/GLM-4.7-Flash-MTP-NVFP4 5 months ago

Upload folder using huggingface_hub

#1 opened 5 months ago by

GadflyII

New activity in GadflyII/GLM-4.7-Flash-NVFP4 5 months ago

Can't deploy by vllm 0.14.1 + transformers

#6 opened 5 months ago by

Butterfly-314

New activity in GadflyII/GLM-4.7-Flash-MXFP4 5 months ago

can not run

#1 opened 5 months ago by

aliez-ren

New activity in GadflyII/GLM-4.7-Flash-NVFP4 5 months ago

please create mlx version of this

#4 opened 5 months ago by

Narutoouz

Wasn't able to recreate MMLU-Pro benchmarks

#5 opened 5 months ago by

zenmagnets

New activity in GadflyII/MiniMax-M2.1-NVFP4 5 months ago

Request for GLM 4.6V

#1 opened 6 months ago by

SFPLM

New activity in GadflyII/GLM-4.7-Flash-NVFP4 5 months ago

GadflyII/GLM-4.7-Flash-NVFP4

#3 opened 5 months ago by

Yu21342

Really appreciate that you ran performance comparison tests with BF16!

#2 opened 5 months ago by

zenmagnets

GadflyII

AI & ML interests

Recent Activity

Organizations

GadflyII's activity

Well done nvfp4 quant

Why Your NVFP4 Model Is Slower Than FP8 on the GB10 (NVIDIA Spark) — And How to Fix It

SGLang and MTP

Model requests?

Fails on a single DGX spark with errors below

Update MXFP4 format to compressed-tensors

Here's the vLLM recipe I'm using with 2x RTX Pro 6000

MMLU PRO Benchmark

vLLM 0.16?

Memory

confused response

MTP quality, 47 layer

Upload folder using huggingface_hub

Can't deploy by vllm 0.14.1 + transformers

can not run

please create mlx version of this

Wasn't able to recreate MMLU-Pro benchmarks

Request for GLM 4.6V

GadflyII/GLM-4.7-Flash-NVFP4

Really appreciate that you ran performance comparison tests with BF16!