Heads Up (May 1): Transformers Config Fix – What It Means for GGUFs & Quantized & Fine-Tuned Models

#18
by juliendenize - opened

Hey everyone,

Thanks to Unsloth’s help, we applied a fix to the Transformers config. The bug caused performance degradation especially for medium to long context usage, as it affected how ROPE computations are handled.

Unfortunately, it has been propagated to all GGUFs, quantizations, and fine-tuned models created using the Transformers config prior to the fix.

Please give it another try with up-to-date models to ensure you get the best performance. Unsloth’s GGUFs are already up to date.

Sorry for the inconvenience. As always, we're welcoming feedback so let us know how it goes !

This bug has not affected vLLM, which is our recommended format. We’re working on improving our testing coverage across the ecosystem.

juliendenize pinned discussion
juliendenize changed discussion title from Transformers fix to Heads Up (May 1): Transformers Config Fix – What It Means for Quantized & Fine-Tuned Models
juliendenize changed discussion title from Heads Up (May 1): Transformers Config Fix – What It Means for Quantized & Fine-Tuned Models to Heads Up (May 1): Transformers Config Fix – What It Means for GGUFs & Quantized & Fine-Tuned Models

Thanks for working with us Julien and for the fast turnaround! We found the model to work extremely well now!

Thanks for the update and figuring out the fix itself!
PPL noticeably dropped after applying it

When will mlx support be in place?

Prior models are for sure not affected?

When will mlx support be in place?

@Thump604 I reached out to the people behind models that are publicly reporting they're based on this one through the HF metadata. I hope the people making these conversions will consider updating their model - if needed - but it's up to them to decide to go for it or not as it's purely voluntary :). I strongly suggest you making discussions to kindly ask them if they can do it, or upvote relevant discussions if they already exist.

Prior models are for sure not affected?

@Lockout I don't think it is the case, i got wrongly converted checkpoint by running the conversation script maintained in Transformers - a PR is in progress to fix that.

I do not think previous models were impacted as since this arch (ministral3) exists we released:

  • Devstral small and medium 2: first one has mscale_all_dim=1 and second one mscale_all_dim=0 which is correct.
  • Ministraux: have all mscale_all_dim=1 which is correct.

Note that this is for Transformers configuration, i do not know if llama.cpp were already dealing correctly with mscale_all_dim at the time of the release even if I expect it was the case. Did you experience issues with previous releases suggesting otherwise ?

After taking a look at the fix, I think this will work fine as is for MLX - redownloading and verifying now.

You are de best thx for your continuous work for da community, thx unsloth, thx Julian and all the others who was involved

Sign up or log in to comment