Coconut Qwen2.5-7B Flawed Fictions v2

This repository contains a Coconut checkpoint packaged for both decoding modes:

  • standard decoding from the HF model files at repo root
  • Coconut latent decoding from latent_checkpoint.pt

The goal is that a single snapshot_download() is enough for either evaluation path.

What Is In This Repo

  • Standard HF model files at repo root for Transformers / vLLM
  • latent_checkpoint.pt with the original Coconut checkpoint needed for latent decoding
  • latent_metadata.json with c_thought, max_latent_stage, provenance, and attached eval metadata
  • artifacts/source_wandb_config.yaml, artifacts/source_wandb_summary.json, and artifacts/source_wandb_metadata.json
  • artifacts/eval_* files copied from local evaluation outputs when available

Source Provenance

Field Value
WandB run qlivu0at
Run date 2025-10-30
Task Flawed Fictions continuity error detection
Base model Qwen/Qwen2.5-7B-Instruct
Git commit db8d7fffcaac2bcdddae1f539ea5dea00996cd79
Host / GPU alexgurung-fftest-fsgsd-gmrs7 / NVIDIA H200
Original checkpoint /mnt/disk/coconut/checkpoints/qwen-coconut-ff-v2/checkpoint_13
Checkpoint size 14.2 GB
Generalization slug coconut_ff_v2

Usage

Standard decoding

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_dir = "agurung/qwen-coconut-ff-v2"
model = AutoModelForCausalLM.from_pretrained(repo_dir, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(repo_dir)

Coconut latent decoding

from huggingface_hub import snapshot_download
from pathlib import Path

local_dir = Path(snapshot_download("agurung/qwen-coconut-ff-v2"))
checkpoint = local_dir / "latent_checkpoint.pt"

Example evaluator invocation:

python -m litereason.experiments.generalization.evaluate_coconut   --checkpoint "$LOCAL_DIR/latent_checkpoint.pt"   --base-model-id "Qwen/Qwen2.5-7B-Instruct"   --mode coconut   --c-thought 1   --max-latent-stage 10   --test-file litereason/experiments/generalization/data/gsm8k.jsonl   --prompt-variant standard   --save-preds preds_gsm8k_standard.jsonl   --num-samples 5   --use-chat-template

Training Configuration

Field Value
Project coconut
Run name qwen-coconut-ff-v2
Train path ff_data/train.json
Val path ff_data/val.json
Use chat template True
Use boxed answers True
c_thought 1
epochs_per_stage 2
max_latent_stage 10
Batch size / GPU 1
Gradient accumulation 64
num_epochs 14
lr 5e-05
weight_decay 0.01

Logged Training Metrics

Metric Value
eval/loss 0.588139960106383
train/loss 0.76171875
train/epoch 1
train/step 188

Attached Local Eval Artifacts

  • combined_eval_with_sem.json: accuracy=0.5064516129032258, total_samples=None, 95% CI=[0.4634663301219705, 0.5494368956844812]
  • ff_combined_eval.json: accuracy=0.6016129032258064, total_samples=620

Notes

  • Repo root contains the extracted standard HF model plus latent_checkpoint.pt for Coconut decoding.
  • The local extracted_hf_model README indicates the existing extracted model was derived from checkpoint_13.

Local Reference Paths

  • WandB config: /mnt/volume3/coconut/wandb/run-20251030_005318-qlivu0at/files/config.yaml
  • WandB summary: /mnt/volume3/coconut/wandb/run-20251030_005318-qlivu0at/files/wandb-summary.json
  • WandB metadata: /mnt/volume3/coconut/wandb/run-20251030_005318-qlivu0at/files/wandb-metadata.json
  • Standard-model extract dir: /mnt/disk/baseline_colar/hf_prepared/coconut_ff_v2/standard_model
Downloads last month
8
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agurung/qwen-coconut-ff-v2

Base model

Qwen/Qwen2.5-7B
Finetuned
(3263)
this model