Is there a benefit of this version vs the original MXFP4?

by SuperbEmphasis - opened Sep 13

Sep 13

•

I'm currently running gpt-oss-120b using 2xH100 gpus via vllm.

But is there a benefit of using this version? Im wondering if using FP8 with the H100 would have a faster response since the H100 can utilize the FP8 cores at the cost of increased VRAM usage?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment