Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding Paper • 2508.13804 • Published Aug 19 • 3
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search Paper • 2507.15245 • Published Jul 21 • 11
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1 • 46
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13 • 73