NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper • 2512.12730 • Published Dec 14, 2025 • 44
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows Paper • 2512.13168 • Published Dec 15, 2025 • 50
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment Paper • 2512.12692 • Published Dec 14, 2025 • 14
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 9 days ago • 95
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 Jul 29, 2025 • 211
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7, 2025 • 107
WhiteRabbitNeo-V3 Collection The latest and most capable cybersecurity model we've ever created • 1 item • Updated Jun 25, 2025 • 17
Rethinking Verification for LLM Code Generation: From Generation to Testing Paper • 2507.06920 • Published Jul 9, 2025 • 29
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26, 2025 • 28
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development Paper • 2506.05010 • Published Jun 5, 2025 • 80
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published Jun 11, 2025 • 53
Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs Paper • 2506.05629 • Published Jun 5, 2025 • 37