VP-VLA-Robocasa-Tabletop

This repository contains the VP-VLA policy checkpoint trained for RoboCasa tabletop manipulation.

VP-VLA uses visual prompts as an interface for vision-language-action models: a high-level planner converts language instructions into visual prompts, and the policy follows those prompts to produce robot actions.

Usage

Use this checkpoint with the released VP-VLA codebase:

Code: https://github.com/JIA-Lab-research/VP-VLA
Paper: https://huggingface.co/papers/2603.22003

Please follow the installation and evaluation instructions in the VP-VLA repository, then pass this checkpoint path to the RoboCasa tabletop evaluation script.

Citation

If you use this model, please cite the VP-VLA paper:

https://huggingface.co/papers/2603.22003

Downloads last month: 10

Video Preview

Robotics

Collection including Vincent2311/VP-VLA-Robocasa-Tabletop

VP-VLA

Collection

Official checkpoints for VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models • 2 items • Updated 23 days ago

Paper for Vincent2311/VP-VLA-Robocasa-Tabletop

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

Paper • 2603.22003 • Published Mar 23 • 12