LaViDA Variant B Seed-0 OracleAug alpha=0.2

Oracle-augmented chi-square critic branch for the seed-0 control matrix.

This repository is part of LaViDA: Latent Visitation Distribution Alignment for Mathematical Reasoning. It is a research checkpoint from the Oracle Phase-3 seed-0 matrix. The public artifact is intended for reproducibility and analysis, not as a general-purpose assistant model.

Model Details

Field Value
Base model Qwen/Qwen2.5-Math-7B
Adaptation LoRA adapters, rank 64 on linear layers
Training algorithm GRPO
Variant label B_OracleAug
Loss mode chi_square
Auxiliary weight alpha 0.2
Expert pool / data Oracle-augmented v2 expert pool: 12,317 traces = 8,963 self + 3,354 filtered Oracle.
Training steps 2,000 RL optimizer steps
Evaluation prompt path base-model cot-4shot
W&B run id ulp4wfzz
W&B project lavida-mvm

Training Method

GRPO plus reward-gated dual Pearson chi-square critic over frozen VAE latents.

Shared setup:

  • Binary exact-match reward using the Qwen2.5-Math evaluation stack.
  • Group sampling with GRPO on hard mathematical prompts.
  • Frozen base-model hidden-state feature extraction over the last 4 transformer layers.
  • Feature vector psi = [h_start || h_end || h_mean || delta_H] in R^14336 for LaViDA variants.
  • Frozen VAE latent dimension 256 for auxiliary branches.
  • Maximum completion length 3072 tokens.

Seed-0 MATH-500 Results

Metric Value
Greedy overall (T=0) 74.2%
n=8 mean correctness (T=0.6) 74.28%
pass@8 77.0%
L4-5 pass@8 65.27%
Level-5 pass@8 54.48%

Interpretation: Null control on mean correctness; useful evidence that the learned chi-square critic was not the winning transfer mechanism in seed 0.

Related Data

Related model repos:

How To Use

This checkpoint is expected to be used with the base model and PEFT/LoRA loading:

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_id = "Qwen/Qwen2.5-Math-7B"
adapter_id = "Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, trust_remote_code=True, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

Use the same base-model cot-4shot evaluation path used in the LaViDA experiments for comparable MATH-500 numbers.

Limitations

  • This is a seed-0 research checkpoint; the main A_2000 vs D_OracleAug replication target is still seed 1.
  • Results are currently for MATH-500 only in the locked public ledger.
  • The model was trained for mathematical reasoning experiments and should not be treated as a general assistant.
  • Oracle-generated traces are machine-generated and filtered, not human process annotations.
  • The chi-square critic branches are controls / negative evidence in seed 0; the positive RL-side mechanism candidate is nearest-expert MSE (D_OracleAug).

Citation

@misc{saha2026lavidaboracleaug,
  title  = {LaViDA Variant B Seed-0 OracleAug alpha=0.2},
  author = {Saha, Pritish},
  year   = {2026},
  url    = {https://huggingface.co/Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2}
}
Downloads last month
-
Video Preview
loading

Model tree for Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2

Base model

Qwen/Qwen2.5-7B
Adapter
(18)
this model

Datasets used to train Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2