ClawGym: A Scalable Framework for Building Effective Claw Agents Paper • 2604.26904 • Published 6 days ago • 47
Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing Paper • 2512.23611 • Published Dec 29, 2025 • 7
Context as a Tool: Context Management for Long-Horizon SWE-Agents Paper • 2512.22087 • Published Dec 26, 2025 • 4
Scaling Laws for Code: Every Programming Language Matters Paper • 2512.13472 • Published Dec 15, 2025 • 17
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression Paper • 2604.19572 • Published 14 days ago • 21
InCoder-32B-Thinking: Industrial Code World Model for Thinking Paper • 2604.03144 • Published Apr 3 • 233
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published Mar 17 • 310
McEval Collection McEval: Massively Multilingual Code Evaluation • 2 items • Updated Nov 11, 2024 • 1
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24, 2024 • 41
TableBench Collection TableBench: A Comprehensive and Complex Benchmark for Table Question Answering • 7 items • Updated Mar 2 • 3
FuzzCoder: Byte-level Fuzzing Test via Large Language Model Paper • 2409.01944 • Published Sep 3, 2024 • 45