📋 Eval Logs Collection Benchmark log generated with Twinkle Eval, recording the model's outputs for each prompt. • 2 items • Updated 7 days ago • 4
🏎️ Formosa-1 Series Collection A collection of Formosa-1 (F1) reasoning models and datasets focused on Traditional Chinese instruction-following and logic. • 4 items • Updated 7 days ago • 4
🧠 Traditional Chinese Reasoning Datasets Collection A curated collection of datasets designed to evaluate and train reasoning capabilities in Traditional Chinese across various domains. • 3 items • Updated Oct 13, 2025 • 9