-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 189 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
Collections
Discover the best community collections!
Collections including paper arxiv:2507.19849
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper • 2507.19457 • Published • 28 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 316 -
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Paper • 2510.03215 • Published • 97
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 96 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 108 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104
-
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 14 -
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper • 2507.21802 • Published • 17 -
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
Paper • 2507.21848 • Published • 8 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 68 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 211 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 110
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 189 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 96 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 108 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104
-
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper • 2507.21183 • Published • 14 -
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper • 2507.21802 • Published • 17 -
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
Paper • 2507.21848 • Published • 8 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158
-
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper • 2507.19457 • Published • 28 -
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Group Sequence Policy Optimization
Paper • 2507.18071 • Published • 316 -
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Paper • 2510.03215 • Published • 97
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 68 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 211 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 110