LaViDA Variant B Seed-0 OracleAug alpha=0.2

Oracle-augmented chi-square critic branch for the seed-0 control matrix.

This repository is part of LaViDA: Latent Visitation Distribution Alignment for Mathematical Reasoning. It is a research checkpoint from the Oracle Phase-3 seed-0 matrix. The public artifact is intended for reproducibility and analysis, not as a general-purpose assistant model.

Model Details

Field	Value
Base model	`Qwen/Qwen2.5-Math-7B`
Adaptation	LoRA adapters, rank 64 on linear layers
Training algorithm	GRPO
Variant label	`B_OracleAug`
Loss mode	`chi_square`
Auxiliary weight `alpha`	`0.2`
Expert pool / data	Oracle-augmented v2 expert pool: 12,317 traces = 8,963 self + 3,354 filtered Oracle.
Training steps	2,000 RL optimizer steps
Evaluation prompt path	base-model `cot-4shot`
W&B run id	`ulp4wfzz`
W&B project	`lavida-mvm`

Training Method

GRPO plus reward-gated dual Pearson chi-square critic over frozen VAE latents.

Shared setup:

Binary exact-match reward using the Qwen2.5-Math evaluation stack.
Group sampling with GRPO on hard mathematical prompts.
Frozen base-model hidden-state feature extraction over the last 4 transformer layers.
Feature vector psi = [h_start || h_end || h_mean || delta_H] in R^14336 for LaViDA variants.
Frozen VAE latent dimension 256 for auxiliary branches.
Maximum completion length 3072 tokens.

Seed-0 MATH-500 Results

Metric	Value
Greedy overall (`T=0`)	74.2%
`n=8` mean correctness (`T=0.6`)	74.28%
pass@8	77.0%
L4-5 pass@8	65.27%
Level-5 pass@8	54.48%

Interpretation: Null control on mean correctness; useful evidence that the learned chi-square critic was not the winning transfer mechanism in seed 0.

Related Data

Related model repos:

How To Use

This checkpoint is expected to be used with the base model and PEFT/LoRA loading:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-Math-7B"
adapter_id = "Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, trust_remote_code=True, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

Use the same base-model cot-4shot evaluation path used in the LaViDA experiments for comparable MATH-500 numbers.

Limitations

This is a seed-0 research checkpoint; the main A_2000 vs D_OracleAug replication target is still seed 1.
Results are currently for MATH-500 only in the locked public ledger.
The model was trained for mathematical reasoning experiments and should not be treated as a general assistant.
Oracle-generated traces are machine-generated and filtered, not human process annotations.
The chi-square critic branches are controls / negative evidence in seed 0; the positive RL-side mechanism candidate is nearest-expert MSE (D_OracleAug).

Citation

@misc{saha2026lavidaboracleaug,
  title  = {LaViDA Variant B Seed-0 OracleAug alpha=0.2},
  author = {Saha, Pritish},
  year   = {2026},
  url    = {https://huggingface.co/Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2}
}

Downloads last month: -

Video Preview

Reinforcement Learning

Model tree for Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Math-7B

Adapter

(18)

this model

Pritish92
/

lavida-variant-B-seed0-oracleaug-alpha0p2