Valendra Qwen3.5-4B Demon Angel (Experimental model)

Valendra Qwen3.5-4B Demon Angel is a merged model created from the LoRA adapter trained in this repository and the Qwen/Qwen3.5-4B base model. The name is deliberately literal: it reflects the core internal opposition between a demon that attacks weak reasoning and an angel that proposes the answer.

Overview

This model was trained to internalize a structured self-debate pattern before emitting a visible answer.

  • An angel proposes a solution.
  • A demon attacks weak assumptions, blind spots, and overconfidence.
  • A judge synthesizes the outcome and chooses the final stance.

The intent is not to expose chain-of-thought in production. The intent is to make the visible answer stronger by forcing internal critique and synthesis first.

Relation to SDRL

This model is aligned in spirit with Prepare Reasoning Language Models for Multi-Agent Debate with Self-Debate Reinforcement Learning, arXiv:2601.22297v1.

It is not a reproduction of SDRL. Instead, it follows the same broad intuition inside this repository's own stack: a single model should improve when it learns to work across multiple reasoning trajectories instead of solving every prompt in isolation.

Details

  • Base model: Qwen/Qwen3.5-4B
  • Suggested repo: valendra/qwen3.5-4b-demon-angel
  • Training flow: LoRA SFT, then GRPO-style reinforcement learning, then local merge
  • Internal format: a single block with angel, demon, and judge roles
  • Serving goal: expose only the visible answer after the internal reasoning block

Intended Use

Use this model for experiments where you want stronger internal critique and synthesis than a plain instruction-tuned baseline, while still serving only a final answer.

Limitations

  • This model was trained with synthetic and programmatic supervision, so it should be validated on real downstream prompts before production use.
  • It is designed around a learned internal debate format, not around unrestricted free-form reasoning traces.
  • This model card describes the merged artifact produced in this repository. It does not claim benchmark parity with SDRL or paper-level reproduction.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "valendra/qwen3.5-4b-demon-angel"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
Downloads last month
527
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for valendra/qwen3.5-4b-demon-angel

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(173)
this model
Quantizations
2 models

Collection including valendra/qwen3.5-4b-demon-angel

Paper for valendra/qwen3.5-4b-demon-angel