Coconut Qwen2.5-7B Flawed Fictions v2

This repository contains a Coconut checkpoint packaged for both decoding modes:

standard decoding from the HF model files at repo root
Coconut latent decoding from latent_checkpoint.pt

The goal is that a single snapshot_download() is enough for either evaluation path.

What Is In This Repo

Standard HF model files at repo root for Transformers / vLLM
latent_checkpoint.pt with the original Coconut checkpoint needed for latent decoding
latent_metadata.json with c_thought, max_latent_stage, provenance, and attached eval metadata
artifacts/source_wandb_config.yaml, artifacts/source_wandb_summary.json, and artifacts/source_wandb_metadata.json
artifacts/eval_* files copied from local evaluation outputs when available

Source Provenance

Field	Value
WandB run	`qlivu0at`
Run date	`2025-10-30`
Task	Flawed Fictions continuity error detection
Base model	`Qwen/Qwen2.5-7B-Instruct`
Git commit	`db8d7fffcaac2bcdddae1f539ea5dea00996cd79`
Host / GPU	`alexgurung-fftest-fsgsd-gmrs7` / `NVIDIA H200`
Original checkpoint	`/mnt/disk/coconut/checkpoints/qwen-coconut-ff-v2/checkpoint_13`
Checkpoint size	`14.2 GB`
Generalization slug	`coconut_ff_v2`

Usage

Standard decoding

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_dir = "agurung/qwen-coconut-ff-v2"
model = AutoModelForCausalLM.from_pretrained(repo_dir, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(repo_dir)

Coconut latent decoding

from huggingface_hub import snapshot_download
from pathlib import Path

local_dir = Path(snapshot_download("agurung/qwen-coconut-ff-v2"))
checkpoint = local_dir / "latent_checkpoint.pt"

Example evaluator invocation:

python -m litereason.experiments.generalization.evaluate_coconut   --checkpoint "$LOCAL_DIR/latent_checkpoint.pt"   --base-model-id "Qwen/Qwen2.5-7B-Instruct"   --mode coconut   --c-thought 1   --max-latent-stage 10   --test-file litereason/experiments/generalization/data/gsm8k.jsonl   --prompt-variant standard   --save-preds preds_gsm8k_standard.jsonl   --num-samples 5   --use-chat-template

Training Configuration

Field	Value
Project	`coconut`
Run name	`qwen-coconut-ff-v2`
Train path	`ff_data/train.json`
Val path	`ff_data/val.json`
Use chat template	`True`
Use boxed answers	`True`
c_thought	`1`
epochs_per_stage	`2`
max_latent_stage	`10`
Batch size / GPU	`1`
Gradient accumulation	`64`
num_epochs	`14`
lr	`5e-05`
weight_decay	`0.01`

Logged Training Metrics

Metric	Value
`eval/loss`	`0.588139960106383`
`train/loss`	`0.76171875`
`train/epoch`	`1`
`train/step`	`188`

Attached Local Eval Artifacts

combined_eval_with_sem.json: accuracy=0.5064516129032258, total_samples=None, 95% CI=[0.4634663301219705, 0.5494368956844812]
ff_combined_eval.json: accuracy=0.6016129032258064, total_samples=620

Notes

Repo root contains the extracted standard HF model plus latent_checkpoint.pt for Coconut decoding.
The local extracted_hf_model README indicates the existing extracted model was derived from checkpoint_13.

Local Reference Paths

WandB config: /mnt/volume3/coconut/wandb/run-20251030_005318-qlivu0at/files/config.yaml
WandB summary: /mnt/volume3/coconut/wandb/run-20251030_005318-qlivu0at/files/wandb-summary.json
WandB metadata: /mnt/volume3/coconut/wandb/run-20251030_005318-qlivu0at/files/wandb-metadata.json
Standard-model extract dir: /mnt/disk/baseline_colar/hf_prepared/coconut_ff_v2/standard_model

Downloads last month: 8

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agurung/qwen-coconut-ff-v2

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(3263)

this model