πŸ† Evaluation Results

#1
by nathanael-fijalkow - opened
MVA+IASD LLM for code and proof org

Evaluation Results

Model: LLM-course/simple_tokenizer
Parameters: 861,184 [PASS]
Chess library check: [PASS]

Performance

Metric Value
Total moves played 500
Games played 26
Legal moves (first try) 76 (15.2%)
Legal moves (with retries) 172 (34.4%)

Interpretation

  • >90% legal rate: Excellent! Model has learned chess rules well.
  • 70-90% legal rate: Good, but room for improvement.
  • <70% legal rate: Model struggles with legal move generation.

Sign up or log in to comment