Why the weight did not match between transformers version and mistral version?

#40
by fahadh4ilyas - opened

I tried to match the weight between safetensors for transformers (model.safetensors) and safetensors for mistral (consolidated.safetensors).

It seems that

  • layers.n.attention.wk.weight from mistral is not matched with language_model.model.layers.n.self_attn.k_proj.weight
  • layers.n.attention.wq.weight from mistral is not matched with language_model.model.layers.n.self_attn.q_proj.weight

All the rest are matched perfectly. Why? How do you map the weight there?

Mistral AI_ org

Hey @fahadh4ilyas

Due to different choices for RoPE implementation, you can see in the conversion script that the weights are the same, but permuted.

juliendenize changed discussion status to closed

Sign up or log in to comment