Community Evals Feedback

#1
by burtenshaw - opened

The Hub provides a decentralized system for tracking model evaluation results. Benchmark datasets host leaderboards, and model repos store evaluation scores that automatically appear on both the model page and the benchmark鈥檚 leaderboard.

image

馃攰 Let us know what you think of this feature:

Looks great! Could you provide instruction how to run locally evaluation using inspect ai on these 3 benchmarks?

OpenEvals org

Hey @djstrong you can run something like:

inspect eval hf/cais/hle --model hf/openai-community/gpt2

to run a local transformers model, here are the docs: https://inspect.aisi.org.uk/providers.html

@burtenshaw
Is it possible to configure a subset of the data to be closed and private. I think that would be super valuable

Sign up or log in to comment