Is there a benefit of this version vs the original MXFP4?

#5
by SuperbEmphasis - opened

I'm currently running gpt-oss-120b using 2xH100 gpus via vllm.

But is there a benefit of using this version? Im wondering if using FP8 with the H100 would have a faster response since the H100 can utilize the FP8 cores at the cost of increased VRAM usage?

Sign up or log in to comment