-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
Collections
Discover the best community collections!
Collections including paper arxiv:2604.14228
-
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems
Paper • 2604.14228 • Published • 25 -
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)
Paper • 2604.17091 • Published • 13 -
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Paper • 2504.19413 • Published • 52
-
More Agents Is All You Need
Paper • 2402.05120 • Published • 57 -
UFO: A UI-Focused Agent for Windows OS Interaction
Paper • 2402.07939 • Published • 17 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 24 -
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Paper • 2407.04363 • Published • 34
-
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems
Paper • 2604.14228 • Published • 25 -
AutoDev: Automated AI-Driven Development
Paper • 2403.08299 • Published • 15 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 61
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 24 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 15 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
The Debugging Decay Index: Rethinking Debugging Strategies for Code LLMs
Paper • 2506.18403 • Published • 3 -
ReCode: Updating Code API Knowledge with Reinforcement Learning
Paper • 2506.20495 • Published • 10 -
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Paper • 2507.23348 • Published • 12 -
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Paper • 2509.09614 • Published • 7
-
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems
Paper • 2604.14228 • Published • 25 -
AutoDev: Automated AI-Driven Development
Paper • 2403.08299 • Published • 15 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 61
-
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems
Paper • 2604.14228 • Published • 25 -
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)
Paper • 2604.17091 • Published • 13 -
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Paper • 2504.19413 • Published • 52
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 24 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 15 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
More Agents Is All You Need
Paper • 2402.05120 • Published • 57 -
UFO: A UI-Focused Agent for Windows OS Interaction
Paper • 2402.07939 • Published • 17 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 24 -
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Paper • 2407.04363 • Published • 34