Reasoning Models Struggle to Control their Chains of Thought Paper • 2603.05706 • Published 18 days ago • 34
FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use Paper • 2603.08262 • Published 15 days ago • 39