Coconut Qwen2.5-7B Flawed Fictions v2
This repository contains a Coconut checkpoint packaged for both decoding modes:
- standard decoding from the HF model files at repo root
- Coconut latent decoding from
latent_checkpoint.pt
The goal is that a single snapshot_download() is enough for either evaluation path.
What Is In This Repo
- Standard HF model files at repo root for Transformers / vLLM
latent_checkpoint.ptwith the original Coconut checkpoint needed for latent decodinglatent_metadata.jsonwithc_thought,max_latent_stage, provenance, and attached eval metadataartifacts/source_wandb_config.yaml,artifacts/source_wandb_summary.json, andartifacts/source_wandb_metadata.jsonartifacts/eval_*files copied from local evaluation outputs when available
Source Provenance
| Field | Value |
|---|---|
| WandB run | qlivu0at |
| Run date | 2025-10-30 |
| Task | Flawed Fictions continuity error detection |
| Base model | Qwen/Qwen2.5-7B-Instruct |
| Git commit | db8d7fffcaac2bcdddae1f539ea5dea00996cd79 |
| Host / GPU | alexgurung-fftest-fsgsd-gmrs7 / NVIDIA H200 |
| Original checkpoint | /mnt/disk/coconut/checkpoints/qwen-coconut-ff-v2/checkpoint_13 |
| Checkpoint size | 14.2 GB |
| Generalization slug | coconut_ff_v2 |
Usage
Standard decoding
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_dir = "agurung/qwen-coconut-ff-v2"
model = AutoModelForCausalLM.from_pretrained(repo_dir, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(repo_dir)
Coconut latent decoding
from huggingface_hub import snapshot_download
from pathlib import Path
local_dir = Path(snapshot_download("agurung/qwen-coconut-ff-v2"))
checkpoint = local_dir / "latent_checkpoint.pt"
Example evaluator invocation:
python -m litereason.experiments.generalization.evaluate_coconut --checkpoint "$LOCAL_DIR/latent_checkpoint.pt" --base-model-id "Qwen/Qwen2.5-7B-Instruct" --mode coconut --c-thought 1 --max-latent-stage 10 --test-file litereason/experiments/generalization/data/gsm8k.jsonl --prompt-variant standard --save-preds preds_gsm8k_standard.jsonl --num-samples 5 --use-chat-template
Training Configuration
| Field | Value |
|---|---|
| Project | coconut |
| Run name | qwen-coconut-ff-v2 |
| Train path | ff_data/train.json |
| Val path | ff_data/val.json |
| Use chat template | True |
| Use boxed answers | True |
| c_thought | 1 |
| epochs_per_stage | 2 |
| max_latent_stage | 10 |
| Batch size / GPU | 1 |
| Gradient accumulation | 64 |
| num_epochs | 14 |
| lr | 5e-05 |
| weight_decay | 0.01 |
Logged Training Metrics
| Metric | Value |
|---|---|
eval/loss |
0.588139960106383 |
train/loss |
0.76171875 |
train/epoch |
1 |
train/step |
188 |
Attached Local Eval Artifacts
combined_eval_with_sem.json: accuracy=0.5064516129032258, total_samples=None, 95% CI=[0.4634663301219705, 0.5494368956844812]ff_combined_eval.json: accuracy=0.6016129032258064, total_samples=620
Notes
- Repo root contains the extracted standard HF model plus latent_checkpoint.pt for Coconut decoding.
- The local extracted_hf_model README indicates the existing extracted model was derived from checkpoint_13.
Local Reference Paths
- WandB config:
/mnt/volume3/coconut/wandb/run-20251030_005318-qlivu0at/files/config.yaml - WandB summary:
/mnt/volume3/coconut/wandb/run-20251030_005318-qlivu0at/files/wandb-summary.json - WandB metadata:
/mnt/volume3/coconut/wandb/run-20251030_005318-qlivu0at/files/wandb-metadata.json - Standard-model extract dir:
/mnt/disk/baseline_colar/hf_prepared/coconut_ff_v2/standard_model
- Downloads last month
- 8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support