Why the weight did not match between transformers version and mistral version?

#40

by fahadh4ilyas - opened Jul 31, 2025

Jul 31, 2025

I tried to match the weight between safetensors for transformers (model.safetensors) and safetensors for mistral (consolidated.safetensors).

It seems that

layers.n.attention.wk.weight from mistral is not matched with language_model.model.layers.n.self_attn.k_proj.weight
layers.n.attention.wq.weight from mistral is not matched with language_model.model.layers.n.self_attn.q_proj.weight

All the rest are matched perfectly. Why? How do you map the weight there?

eustlb

Mistral AI_ org Sep 2, 2025

Due to different choices for RoPE implementation, you can see in the conversion script that the weights are the same, but permuted.

juliendenize changed discussion status to closed 12 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment