Qwen3-Next-80B-A3B Instruct & Thinking (Abliterated/Uncensored if possible)
https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct
https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking
Uggh, No llama.cpp support π’π
Serious problem for all the new model types coming out
ππͺπππβΉοΈπ₯Ίπ₯Ήπ₯π’πππππ©π«πΏπ
Qwen3NextForCausalLM is not currently supported by llama.cpp and while there is very high demand to implement support for it the task is monumental. It seems quite overwhelming more than what should be expected from some volunteers contributing in their spare time to an open-source project to be implemented. So much is missing I don't see support getting implemented anytime soon despite many working on implementing all the missing parts. Iβm not surprised the Qwen team decided not to contribute given the overwhelming amount of work but it obviously would have been way better for the community if they did as this is the perfect model to run on offloaded to RAM. Please see https://github.com/ggml-org/llama.cpp/issues/15940 for more information.
Regarding abliterattion it unfortunately doesn't seem to work for any Qwen3 based models so only way to uncensor them is by fine-tuning. While possible doing so is obviously a bit of an issue for an 80B model. The only place where I could realistically do so would be Richards supercomputer with 4x A100 40 GiB but even there I would need to offload to RAM which is not possible for such a large model until he upgrades to 512 GiB of RAM. I could try on StormPeek using 2x RTX 4090 which works for 70B at low context but 80B seems to be pushing it too much but maybe worth a try. Other than that, the only option seams to be renting some server with like H200 GPUs but for this the model doesnβt seam to be worth the money that this would cost. In case you want to uncensor yourself just do 7 epochs over https://huggingface.co/datasets/ICEPVP8977/Uncensored_Small_Reasoning using axolotl and it will be uncensored.