Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper • 2510.04618 • Published Oct 6 • 125
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6 • 129
Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video Paper • 2507.00339 • Published Jul 1 • 12
MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning Paper • 2506.22992 • Published Jun 28 • 12
Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention Paper • 2506.23542 • Published Jun 30 • 13
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published Jun 20 • 64
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper • 2506.05573 • Published Jun 5 • 81
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 147
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8 • 113
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30 • 277
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency Paper • 2506.08343 • Published Jun 10 • 54
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Paper • 2506.14965 • Published Jun 17 • 49
Running on Zero MCP Featured 561 Image to Music v2 🎺 561 Get a music sample inspired by the mood of an image
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3 • 299
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 May 21 • 244
Running on CPU Upgrade 33 Blazing Fast Whisper 👁 33 Blazing Fast Whisper Deployed on HF Inference Endpoints