Is there a benefit of this version vs the original MXFP4?
#5
by
SuperbEmphasis
- opened
I'm currently running gpt-oss-120b using 2xH100 gpus via vllm.
But is there a benefit of using this version? Im wondering if using FP8 with the H100 would have a faster response since the H100 can utilize the FP8 cores at the cost of increased VRAM usage?