An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications Paper • 2509.19185 • Published Sep 23, 2025 • 3
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them Paper • 2509.21117 • Published Sep 25, 2025 • 30