Tracks perf of LLMs, VLMs and agents on web navigation tasks
Explore agent trajectories and judgments in web benchmarks
Generate Hebrew speech from text