SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 17 days ago • 53
Watch and Learn: Learning to Use Computers from Online Videos Paper • 2510.04673 • Published Oct 6, 2025 • 12
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL Paper • 2512.04069 • Published Dec 3, 2025 • 24
BIOCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models Paper • 2510.20095 • Published Oct 23, 2025 • 1
CE-Bench: Towards a Reliable Contrastive Evaluation Benchmark of Interpretability of Sparse Autoencoders Paper • 2509.00691 • Published Aug 31, 2025 • 2
kabr-tools: Automated Framework for Multi-Species Behavioral Monitoring Paper • 2510.02030 • Published Oct 2, 2025
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds Paper • 2508.14879 • Published Aug 20, 2025 • 69
BIOCLIP: A Vision Foundation Model for the Tree of Life Paper • 2311.18803 • Published Nov 30, 2023 • 1
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images Paper • 2407.08027 • Published Jul 10, 2024
VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images Paper • 2408.16176 • Published Aug 28, 2024 • 8
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning Paper • 2505.23883 • Published May 29, 2025 • 2
An Illusion of Progress? Assessing the Current State of Web Agents Paper • 2504.01382 • Published Apr 2, 2025 • 4
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge Paper • 2506.21506 • Published Jun 26, 2025 • 52
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge Paper • 2506.21506 • Published Jun 26, 2025 • 52
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis Paper • 2501.09333 • Published Jan 16, 2025 • 1
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning Paper • 2505.23883 • Published May 29, 2025 • 2
BIOCLIP: A Vision Foundation Model for the Tree of Life Paper • 2311.18803 • Published Nov 30, 2023 • 1
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models Paper • 2212.04088 • Published Dec 8, 2022
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Paper • 2411.16537 • Published Nov 25, 2024