Can you also make one for the captioner?
https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Captioner
I’d really appreciate it if you could make it.
Additionally, I hope you could also extract the vision transformer.
https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct
Hi bro, do you find captioner encoder now?
As far as I understand, the encoder of the Captioner model is the same as that of the Instruct model. Is there any difference between them?
As far as I understand, the encoder of the Captioner model is the same as that of the Instruct model. Is there any difference between them?
Hey, I think there’s one thing you’re missing: the Captioner checkpoint went through a post-training full-parameter fine-tuning stage. Even though this fine-tuning was done jointly with the rest of the model rather than on the encoder alone, we can still reasonably treat the Captioner encoder as a stronger, more general-purpose audio representation model.
As far as I understand, the encoder of the Captioner model is the same as that of the Instruct model. Is there any difference between them?
I’m working on a paper comparing different audio encoders. Would it be possible for you to provide a standalone encoder checkpoint for the Captioner model, or some guidance / code on how to extract it? It would be extremely helpful for my research and would save a lot of time. Many thanks in advance for your help!
Best regards,
mifanbushipeicai
非常感谢,并且期待着
Got it.
Please wait a moment while I get things ready.
Thanks a lot, really looking forward to it!
I've uploaded my collection here. The inference code is provisional, so some may not work.
I've uploaded my collection here. The inference code is provisional, so some may not work.
Thank you a lot !!!