Instructions to use Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Math-7B") model = PeftModel.from_pretrained(base_model, "Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2") - Notebooks
- Google Colab
- Kaggle
LaViDA Variant B Seed-0 OracleAug alpha=0.2
Oracle-augmented chi-square critic branch for the seed-0 control matrix.
This repository is part of LaViDA: Latent Visitation Distribution Alignment for Mathematical Reasoning. It is a research checkpoint from the Oracle Phase-3 seed-0 matrix. The public artifact is intended for reproducibility and analysis, not as a general-purpose assistant model.
Model Details
| Field | Value |
|---|---|
| Base model | Qwen/Qwen2.5-Math-7B |
| Adaptation | LoRA adapters, rank 64 on linear layers |
| Training algorithm | GRPO |
| Variant label | B_OracleAug |
| Loss mode | chi_square |
Auxiliary weight alpha |
0.2 |
| Expert pool / data | Oracle-augmented v2 expert pool: 12,317 traces = 8,963 self + 3,354 filtered Oracle. |
| Training steps | 2,000 RL optimizer steps |
| Evaluation prompt path | base-model cot-4shot |
| W&B run id | ulp4wfzz |
| W&B project | lavida-mvm |
Training Method
GRPO plus reward-gated dual Pearson chi-square critic over frozen VAE latents.
Shared setup:
- Binary exact-match reward using the Qwen2.5-Math evaluation stack.
- Group sampling with GRPO on hard mathematical prompts.
- Frozen base-model hidden-state feature extraction over the last 4 transformer layers.
- Feature vector
psi = [h_start || h_end || h_mean || delta_H]inR^14336for LaViDA variants. - Frozen VAE latent dimension
256for auxiliary branches. - Maximum completion length 3072 tokens.
Seed-0 MATH-500 Results
| Metric | Value |
|---|---|
Greedy overall (T=0) |
74.2% |
n=8 mean correctness (T=0.6) |
74.28% |
| pass@8 | 77.0% |
| L4-5 pass@8 | 65.27% |
| Level-5 pass@8 | 54.48% |
Interpretation: Null control on mean correctness; useful evidence that the learned chi-square critic was not the winning transfer mechanism in seed 0.
Related Data
Related model repos:
Pritish92/lavida-variant-A-seed0-a-2000Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2Pritish92/lavida-variant-D-seed0-oracleaug-alpha0p001Pritish92/lavida-variant-B-seed0-selfdistill-alpha0p02
How To Use
This checkpoint is expected to be used with the base model and PEFT/LoRA loading:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_id = "Qwen/Qwen2.5-Math-7B"
adapter_id = "Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2"
tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(base_id, trust_remote_code=True, device_map="auto")
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
Use the same base-model cot-4shot evaluation path used in the LaViDA experiments for comparable MATH-500 numbers.
Limitations
- This is a seed-0 research checkpoint; the main
A_2000vsD_OracleAugreplication target is still seed 1. - Results are currently for MATH-500 only in the locked public ledger.
- The model was trained for mathematical reasoning experiments and should not be treated as a general assistant.
- Oracle-generated traces are machine-generated and filtered, not human process annotations.
- The chi-square critic branches are controls / negative evidence in seed 0; the positive RL-side mechanism candidate is nearest-expert MSE (
D_OracleAug).
Citation
@misc{saha2026lavidaboracleaug,
title = {LaViDA Variant B Seed-0 OracleAug alpha=0.2},
author = {Saha, Pritish},
year = {2026},
url = {https://huggingface.co/Pritish92/lavida-variant-B-seed0-oracleaug-alpha0p2}
}
- Downloads last month
- -