Instructions to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="freakyskittle/MiniMax-M2.75-460B-A20B-GGUF",
	filename="MiniMax-M2.75-460B-A20B.Q1_0.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "freakyskittle/MiniMax-M2.75-460B-A20B-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "freakyskittle/MiniMax-M2.75-460B-A20B-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M

Ollama
How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with Ollama:
```
ollama run hf.co/freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
```

Unsloth Studio

How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for freakyskittle/MiniMax-M2.75-460B-A20B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for freakyskittle/MiniMax-M2.75-460B-A20B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for freakyskittle/MiniMax-M2.75-460B-A20B-GGUF to start chatting

Docker Model Runner
How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with Docker Model Runner:
```
docker model run hf.co/freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M
```

Lemonade

How to use freakyskittle/MiniMax-M2.75-460B-A20B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull freakyskittle/MiniMax-M2.75-460B-A20B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.MiniMax-M2.75-460B-A20B-GGUF-Q4_K_M

List all available models

lemonade list

MiniMax-M2.75-460B-A20B-GGUF

Community-quantized GGUF versions of selimaktas/MiniMax-M2.75-460B-A20B.

This is a 460B parameter Mixture-of-Experts (MoE) model with ~20B active parameters per token. The original model is a modified version of MiniMaxAI/MiniMax-M2.7, created by injecting MiniMaxAI/MiniMax-M2.5 experts and doubling the active experts per token. It outperforms the base M2.7 on Single-turn SWE-Bench Verified.

⚠️ Hardware Reality Check: Even though only ~20B parameters are active at inference, quantization must process and store all 460B parameters. The smallest 1-bit quant is ~100 GB; Q4_K_M is ~300 GB. Conversion may require ~1 TB of intermediate disk space.

Available Quants

Quant	Size (approx)	Notes
`IQ1_S`	~100 GB	Smallest, 1-bit-ish. May not be supported on all llama.cpp builds.
`Q2_K`	~175 GB	2-bit K-quant
`Q3_K_M`	~250 GB	3-bit K-quant, balanced
`Q4_K_M`	~300 GB	4-bit K-quant, recommended quality/size tradeoff
`Q5_K_M`	~360 GB	5-bit K-quant
`Q8_0`	~500 GB	8-bit, near-lossless

Download

# Example: Q4_K_M
huggingface-cli download freakyskittle/MiniMax-M2.75-460B-A20B-GGUF --include "*Q4_K_M.gguf"

Usage with llama.cpp

./llama-server \
  -m MiniMax-M2.75-460B-A20B.Q4_K_M.gguf \
  -c 32768 \
  --temp 1.0 \
  --top-p 0.95 \
  --top-k 40

Recommended inference parameters from the original model authors:

temperature=1.0
top_p=0.95
top_k=40

Default system prompt:

You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax.

Original Model Capabilities

Model Self-Evolution: M2.7 initiates a cycle of model self-evolution. An internal version autonomously optimized a programming scaffold over 100+ rounds, achieving a 30% performance improvement. On MLE Bench Lite, M2.7 achieved a 66.6% medal rate.

Professional Software Engineering: On SWE-Pro, M2.7 achieved 56.22%, matching GPT-5.3-Codex. Strong performance on SWE Multilingual (76.5), Multi SWE Bench (52.7), VIBE-Pro (55.6%), Terminal Bench 2 (57.0%), and NL2Repo (39.8%). Supports native Agent Teams for multi-agent collaboration.

Professional Work: ELO score of 1495 on GDPval-AA (highest among open-weight models). Handles Word, Excel, and PPT with high-fidelity multi-round editing. On Toolathon, reached 46.3% accuracy with 97% skill compliance across 40+ complex skills.

Entertainment: Strengthened character consistency and emotional intelligence.

Quantization Scripts

Scripts used for quantization (disk-streamed, no full Python memory load):

quant_minimax_m2.py — downloads, converts to GGUF, and quantizes.
setup_llamacpp.py — clones/builds llama.cpp.

python quant_minimax_m2.py \
  --quant IQ1_S Q2_K Q3_K_M Q4_K_M \
  --threads 4 \
  --download-workers 2 \
  --work-dir /big-ssd/minimax-work \
  --out-dir /big-ssd/minimax-gguf

Model Details

Original Model: selimaktas/MiniMax-M2.75-460B-A20B
Architecture: Mixture-of-Experts (MoE)
Total Parameters: 460B
Active Parameters: ~20B per token
Context Length: 1M tokens (base model)
Quantization Method: llama.cpp GGUF
Original License: Modified MIT (Non-Commercial)

Citation

@misc{selimaktas_minimax-m2.75
  title        = {selimaktas/MiniMax-M2.75-460B-A20B},
  author       = {selimaktas},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/selimaktas/MiniMax-M2.75-460B-A20B}}
}

License

This quantized model is subject to the same license as the original MiniMax M2.7 / M2.75 model:

NON-COMMERCIAL LICENSE
Non-commercial use permitted based on MIT-style terms; commercial use requires prior written authorization.
Copyright (c) 2026 MiniMax
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software for non-commercial purposes, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or provide copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
1. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
2. If the Software (or any derivative works thereof) is used for any Commercial Use, you shall prominently display "Built with MiniMax M2.7" on a related website, user interface, blogpost, about page or product documentation.
3. Any Commercial Use of the Software or any derivative work thereof is prohibited without obtaining a separate, prior written authorization from MiniMax.  To request such authorization, please contact api@minimax.io with the subject line "M2.7 licensing".
4. "Commercial Use" means any use of the Software or any derivative work thereof that is primarily intended for commercial advantage or monetary compensation, which includes, without limitation: (i) offering products or services to third parties for a fee, which utilize, incorporate, or rely on the Software or its derivatives, (ii) the commercial use of APIs provided by or for the Software or its derivatives, including to support or enable commercial products, services, or operations, whether in a cloud-based, hosted, or other similar environment, and (iii) the deployment or provision of the Software or its derivatives that have been subjected to post-training, fine-tuning, instruction-tuning, or any other form of modification, for any commercial purpose.
5. Permitted Free Uses. The following uses are expressly permitted free of charge: (a) personal use, including self-hosted deployment for coding, development of applications, agents, tools, integrations, research, experimentation, or other personal purposes; (b) use by non-profit organizations, academic institutions, and researchers for non-commercial research or educational purposes; (c) modification of the Software solely for the uses described in (a) or (b) above.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Appendix: Prohibited Uses
You agree you will not use, or allow others to use, the Software or any derivatives of the Software to:
1. Generate or disseminate content prohibited by applicable laws or regulations.
2. Assist with, engage in or otherwise support any military purpose.
3. Exploit, harm, or attempt to exploit or harm minors.
4. Generate or disseminate false or misleading information with the intent to cause harm.
5. Promote discrimination, hate speech, or harmful behavior against individuals or groups based on race or ethnic origin, religion, disability, age, nationality and national origin, veteran status, sexual orientation, gender or gender identity, caste, immigration status, or any other characteristic that is associated with systemic discrimination or marginalization.

Model tree for freakyskittle/MiniMax-M2.75-460B-A20B-GGUF

MiniMaxAI/MiniMax-M2.5

MiniMaxAI/MiniMax-M2.7

selimaktas/MiniMax-M2.75-460B-A20B

Merge model

this model

freakyskittle
/

MiniMax-M2.75-460B-A20B-GGUF