--- datasets: - kitti library_name: pytorch license: apache-2.0 pipeline_tag: depth-estimation tags: - deltatok - cvpr2026-highlight --- # Depth Head — KITTI Monocular depth estimation head trained on KITTI (RMSE: 2.79). Part of [A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens](https://huggingface.co/papers/2604.04913) (CVPR 2026 Highlight). ## Usage Requires a frozen [DINOv3](https://github.com/facebookresearch/dinov3) ViT-B backbone. See the [DeltaTok GitHub repository](https://github.com/amazon-far/deltatok) for training and evaluation code. ## Acknowledgements - [DINOv3](https://github.com/facebookresearch/dinov3) - [KITTI](https://www.cvlibs.net/datasets/kitti/) ## Citation ```bibtex @inproceedings{kerssies2026deltatok, title = {A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens}, author = {Kerssies, Tommie and Berton, Gabriele and He, Ju and Yu, Qihang and Ma, Wufei and de Geus, Daan and Dubbelman, Gijs and Chen, Liang-Chieh}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2026} } ```