VP-VLA
Collection
Official checkpoints for VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models • 2 items • Updated
This repository contains the VP-VLA policy checkpoint trained for RoboCasa tabletop manipulation.
VP-VLA uses visual prompts as an interface for vision-language-action models: a high-level planner converts language instructions into visual prompts, and the policy follows those prompts to produce robot actions.
Use this checkpoint with the released VP-VLA codebase:
Please follow the installation and evaluation instructions in the VP-VLA repository, then pass this checkpoint path to the RoboCasa tabletop evaluation script.
If you use this model, please cite the VP-VLA paper: