LlamaOFT · LIBERO (all 4 suites, joint training, 80 k steps)

Vision-Language-Action (VLA) checkpoint released with the AlphaBrain framework. Trained jointly on all four LIBERO suites — Goal, Spatial, Object, and Long — for direct evaluation across the full LIBERO benchmark without retraining.

LlamaOFT couples a Llama-3.2-11B-Vision VLM with a DiT-B regression action head (action_dim=7, horizon=8). This release is the steps = 80 000 checkpoint of a 150 000-step budget run on LIBERO libero_all, and is the strongest multi-task LlamaOFT checkpoint in the AlphaBrain family on LIBERO.

Overview


Architecture	LlamaOFT (Llama 3.2 Vision 11B + DiT-B regression head)
Base VLM	`meta-llama/Llama-3.2-11B-Vision-Instruct`
Action head	DiT-B · `hidden_size=4096`, `action_dim=7`, `state_dim=7`, horizon 8
Training data	LIBERO · all 4 suites (Goal + Spatial + Object + Long) · `dataset_mix=libero_all`
Training type	Supervised fine-tuning (single run; not continual learning)
Attention	SDPA
Optimiser	AdamW · cosine-with-min-lr
Step budget	80 000 (this release) / 150 000 planned
Hardware / batch	4 × A800 80 GB · `per_device_batch = 4` · `grad_accum = 8` · effective batch = 128

Results

Evaluated on all 4 LIBERO suites, 50 rollouts per task × 10 tasks per suite = 500 episodes per suite.

Suite	Success Rate
LIBERO-Goal	97.2 %
LIBERO-Spatial	92.4 %
LIBERO-Object	99.4 %
LIBERO-10 (Long)	82.6 %
Avg (4-suite)	92.9 %

Files

├── README.md                   model card
├── framework_config.yaml       AlphaBrain framework configuration
├── dataset_statistics.json     action normalization statistics
├── model.safetensors           full VLA weights (~21 GB, Llama 11B + DiT-B + DINO)
├── resume_meta.json            training metadata (completed_steps=80000, effective_bs=128)
└── llama_pretrained/           Llama-3.2-Vision tokenizer + chat_template + preprocessor configs

Usage

git clone https://github.com/AlphaBrainGroup/AlphaBrain.git
cd AlphaBrain
pip install -e .

export PRETRAINED_MODELS_DIR=/path/to/models   # must contain Llama-3.2-11B-Vision-Instruct/

huggingface-cli download AlphaBrainGroup/llamaoft-libero-all4suite \
    --local-dir ./llamaoft_libero_all

python deployment/model_server/server_policy.py \
    --ckpt_path ./llamaoft_libero_all --port 10093 --use_bf16

For evaluation on any of the 4 LIBERO suites, see the LIBERO eval pipeline.

Reproduction

bash scripts/run_base_vla/train.sh llama_oft_all_150k

Expect multi-day training on 4 × A800 80 GB for the full 150 000-step schedule. The shipped framework_config.yaml is the exact training configuration used for this checkpoint.

Notes

Joint-training baseline, not continual learning.
Attention: SDPA — chosen so the checkpoint loads without a pinned flash-attn wheel. Users can override to flash_attention_2 via --framework.llamavl.attn_implementation=flash_attention_2 if available.

License

MIT — see the parent repository.

Citation

@misc{alphabrain2026,
  title  = {AlphaBrain: A Modular Open-Source Framework for Embodied Intelligence Research},
  author = {AlphaBrain Team},
  year   = {2026},
  url    = {https://github.com/AlphaBrainGroup/AlphaBrain}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Model tree for AlphaBrainGroup/llamaoft-libero-all4suite

Base model

meta-llama/Llama-3.2-11B-Vision-Instruct

Finetuned

(163)

this model