SpoomplesMaxx Base β€” Gemma 3 27B

SpoomplesMaxx Base is a continued pre-training (CPT) run on top of unsloth/gemma-3-27b-pt, targeting improved creative writing, character voice, narrative prose, and multilingual fluency in English and Brazilian Portuguese.

This is the final release checkpoint of the CPT stage. It is a base model β€” not instruction-tuned. Downstream SFT and DPO stages are planned.

Part of the SpoomplesMaxx project β€” a hobbyist ML research effort focused on creative writing and roleplay capability in open base models.


Model Details

Property Value
Base model google/gemma-3-27b-pt
Architecture Gemma 3 27B
Parameters ~27B
Training stage Continued Pre-Training (CPT)
Training framework TRL + Unsloth
Training type LoRA on text layers only (vision components frozen)
Languages English (en), Brazilian Portuguese (pt)

Why Gemma 3 27B?

After earlier CPT runs on GLM-4-32B, Qwen3-14B, and SmolLM3 β€” which surfaced issues with repetition and inconsistent cultural knowledge absorption β€” Gemma 3 27B PT was selected for its strong out-of-the-box creative writing quality and multilingual coverage.


Uses

Direct Use

This model is suitable for:

  • Creative writing and prose generation
  • Character roleplay and collaborative fiction
  • Multilingual text generation (EN/PT)
  • Base for downstream SFT/DPO fine-tuning

As a base model, it does not follow instructions and has no chat template. Use it with a completion interface or apply your own prompt structure.

Downstream Use

The intended pipeline is: CPT (this model) β†’ SFT β†’ DPO SFT and DPO stages are under active development and will be released separately.

Out-of-Scope Use

  • Drop-in replacement for instruction-following or chat models β€” no system prompt, no chat template
  • Production deployment without further alignment
  • Tasks requiring factual grounding or safety constraints β€” this is an uncensored creative base

Training Details

Training Data

The training corpus combines two sources, concatenated and shuffled (seed 1985) before a 99.8/0.2% train/eval split:

Source Rows Description
aimeri/spoomplesmaxx-cpt-raw-small 91,657 Broad creative writing and prose CPT corpus
characters_small.jsonl 10,000 Curated character-focused entries
Total 101,657

Tokenization appends EOS at document boundaries. Documents are then concatenated and chunked into fixed-length sequences of 16,384 tokens, with the trailing remainder dropped. This yields the final packed training sequences.


Evaluation

No formal benchmarks have been run on this model. Evaluation is currently qualitative β€” creative writing samples, prose coherence, and character voice consistency across English and Portuguese.

If you run benchmarks on this model, please open a discussion β€” contributions welcome.


Project History

SpoomplesMaxx has gone through several base model iterations:

Run Base Status
v1 SmolLM3 3B Experimental, archived
v2 GLM-4-32B Repetition issues, archived
v3 Qwen3-14B Released β€” aimeri/spoomplesmaxx-base-qwen3-14b
v4 (this) Gemma 3 27B Current

Model Card Authors

  • aimeri

Model Card Contact

Open a discussion on the repository page.

Downloads last month
132
Safetensors
Model size
27B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aimeri/spoomplesmaxx-base-gemma3-27b

Finetuned
(75)
this model