Instructions to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="freakyskittle/MiniMax-M2.75-460B-A20B-GGUF", filename="MiniMax-M2.75-460B-A20B.Q1_0.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "freakyskittle/MiniMax-M2.75-460B-A20B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "freakyskittle/MiniMax-M2.75-460B-A20B-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
- Ollama
How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with Ollama:
ollama run hf.co/freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
- Unsloth Studio
How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for freakyskittle/MiniMax-M2.75-460B-A20B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for freakyskittle/MiniMax-M2.75-460B-A20B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for freakyskittle/MiniMax-M2.75-460B-A20B-GGUF to start chatting
- Docker Model Runner
How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with Docker Model Runner:
docker model run hf.co/freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
- Lemonade
How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.MiniMax-M2.75-460B-A20B-GGUF-Q4_K_M
List all available models
lemonade list
MiniMax-M2.75-460B-A20B-GGUF
Community-quantized GGUF versions of selimaktas/MiniMax-M2.75-460B-A20B.
This is a 460B parameter Mixture-of-Experts (MoE) model with ~20B active parameters per token. The original model is a modified version of MiniMaxAI/MiniMax-M2.7, created by injecting MiniMaxAI/MiniMax-M2.5 experts and doubling the active experts per token. It outperforms the base M2.7 on Single-turn SWE-Bench Verified.
โ ๏ธ Hardware Reality Check: Even though only ~20B parameters are active at inference, quantization must process and store all 460B parameters. The smallest 1-bit quant is ~100 GB; Q4_K_M is ~300 GB. Conversion may require ~1 TB of intermediate disk space.
Available Quants
| Quant | Size (approx) | Notes |
|---|---|---|
IQ1_S |
~100 GB | Smallest, 1-bit-ish. May not be supported on all llama.cpp builds. |
Q2_K |
~175 GB | 2-bit K-quant |
Q3_K_M |
~250 GB | 3-bit K-quant, balanced |
Q4_K_M |
~300 GB | 4-bit K-quant, recommended quality/size tradeoff |
Q5_K_M |
~360 GB | 5-bit K-quant |
Q8_0 |
~500 GB | 8-bit, near-lossless |
Download
# Example: Q4_K_M
huggingface-cli download freakyskittle/MiniMax-M2.75-460B-A20B-GGUF --include "*Q4_K_M.gguf"
Usage with llama.cpp
./llama-server \
-m MiniMax-M2.75-460B-A20B.Q4_K_M.gguf \
-c 32768 \
--temp 1.0 \
--top-p 0.95 \
--top-k 40
Recommended inference parameters from the original model authors:
temperature=1.0top_p=0.95top_k=40
Default system prompt:
You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax.
Original Model Capabilities
Model Self-Evolution: M2.7 initiates a cycle of model self-evolution. An internal version autonomously optimized a programming scaffold over 100+ rounds, achieving a 30% performance improvement. On MLE Bench Lite, M2.7 achieved a 66.6% medal rate.
Professional Software Engineering: On SWE-Pro, M2.7 achieved 56.22%, matching GPT-5.3-Codex. Strong performance on SWE Multilingual (76.5), Multi SWE Bench (52.7), VIBE-Pro (55.6%), Terminal Bench 2 (57.0%), and NL2Repo (39.8%). Supports native Agent Teams for multi-agent collaboration.
Professional Work: ELO score of 1495 on GDPval-AA (highest among open-weight models). Handles Word, Excel, and PPT with high-fidelity multi-round editing. On Toolathon, reached 46.3% accuracy with 97% skill compliance across 40+ complex skills.
Entertainment: Strengthened character consistency and emotional intelligence.
Quantization Scripts
Scripts used for quantization (disk-streamed, no full Python memory load):
quant_minimax_m2.pyโ downloads, converts to GGUF, and quantizes.setup_llamacpp.pyโ clones/builds llama.cpp.
python quant_minimax_m2.py \
--quant IQ1_S Q2_K Q3_K_M Q4_K_M \
--threads 4 \
--download-workers 2 \
--work-dir /big-ssd/minimax-work \
--out-dir /big-ssd/minimax-gguf
Model Details
- Original Model: selimaktas/MiniMax-M2.75-460B-A20B
- Architecture: Mixture-of-Experts (MoE)
- Total Parameters: 460B
- Active Parameters: ~20B per token
- Context Length: 1M tokens (base model)
- Quantization Method: llama.cpp GGUF
- Original License: Modified MIT (Non-Commercial)
Citation
@misc{selimaktas_minimax-m2.75
title = {selimaktas/MiniMax-M2.75-460B-A20B},
author = {selimaktas},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/selimaktas/MiniMax-M2.75-460B-A20B}}
}
License
This quantized model is subject to the same license as the original MiniMax M2.7 / M2.75 model:
NON-COMMERCIAL LICENSE
Non-commercial use permitted based on MIT-style terms; commercial use requires prior written authorization.
Copyright (c) 2026 MiniMax
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software for non-commercial purposes, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or provide copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
1. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
2. If the Software (or any derivative works thereof) is used for any Commercial Use, you shall prominently display "Built with MiniMax M2.7" on a related website, user interface, blogpost, about page or product documentation.
3. Any Commercial Use of the Software or any derivative work thereof is prohibited without obtaining a separate, prior written authorization from MiniMax. To request such authorization, please contact api@minimax.io with the subject line "M2.7 licensing".
4. "Commercial Use" means any use of the Software or any derivative work thereof that is primarily intended for commercial advantage or monetary compensation, which includes, without limitation: (i) offering products or services to third parties for a fee, which utilize, incorporate, or rely on the Software or its derivatives, (ii) the commercial use of APIs provided by or for the Software or its derivatives, including to support or enable commercial products, services, or operations, whether in a cloud-based, hosted, or other similar environment, and (iii) the deployment or provision of the Software or its derivatives that have been subjected to post-training, fine-tuning, instruction-tuning, or any other form of modification, for any commercial purpose.
5. Permitted Free Uses. The following uses are expressly permitted free of charge: (a) personal use, including self-hosted deployment for coding, development of applications, agents, tools, integrations, research, experimentation, or other personal purposes; (b) use by non-profit organizations, academic institutions, and researchers for non-commercial research or educational purposes; (c) modification of the Software solely for the uses described in (a) or (b) above.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Appendix: Prohibited Uses
You agree you will not use, or allow others to use, the Software or any derivatives of the Software to:
1. Generate or disseminate content prohibited by applicable laws or regulations.
2. Assist with, engage in or otherwise support any military purpose.
3. Exploit, harm, or attempt to exploit or harm minors.
4. Generate or disseminate false or misleading information with the intent to cause harm.
5. Promote discrimination, hate speech, or harmful behavior against individuals or groups based on race or ethnic origin, religion, disability, age, nationality and national origin, veteran status, sexual orientation, gender or gender identity, caste, immigration status, or any other characteristic that is associated with systemic discrimination or marginalization.
Links
- Original Model: selimaktas/MiniMax-M2.75-460B-A20B
- Base Model: MiniMaxAI/MiniMax-M2.7
- Experts Source: MiniMaxAI/MiniMax-M2.5
- MiniMax GitHub: MiniMax-AI/MiniMax-M2.7
- MiniMax Agent: https://agent.minimax.io/
- MiniMax API: https://platform.minimax.io/
- Downloads last month
- 701
1-bit
2-bit
3-bit
4-bit
5-bit
8-bit