Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process Paper • 2512.23988 • Published Dec 30, 2025 • 19
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time Paper • 2512.25075 • Published Dec 31, 2025 • 16
Guiding a Diffusion Transformer with the Internal Dynamics of Itself Paper • 2512.24176 • Published Dec 30, 2025 • 8
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published Dec 30, 2025 • 52
AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction Paper • 2601.00796 • Published Jan 2 • 32
Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning Paper • 2512.24146 • Published Dec 30, 2025 • 14
SOP: A Scalable Online Post-Training System for Vision-Language-Action Models Paper • 2601.03044 • Published Jan 6 • 28
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 231
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models Paper • 2601.03425 • Published Jan 6 • 17
AgentOCR: Reimagining Agent History via Optical Self-Compression Paper • 2601.04786 • Published Jan 8 • 31
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors Paper • 2601.07226 • Published Jan 12 • 33
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models Paper • 2601.07351 • Published Jan 12 • 26
Dr. Zero: Self-Evolving Search Agents without Training Data Paper • 2601.07055 • Published Jan 11 • 22
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale Paper • 2601.08225 • Published Jan 13 • 53
The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents Paper • 2601.07264 • Published Jan 12 • 24
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Paper • 2601.08303 • Published Jan 13 • 20
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 55
Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments Paper • 2601.01075 • Published Jan 3 • 6
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published Jan 13 • 149
Alterbute: Editing Intrinsic Attributes of Objects in Images Paper • 2601.10714 • Published Jan 15 • 31
Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published Jan 14 • 34
Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text Paper • 2601.10355 • Published Jan 15 • 39
Language of Thought Shapes Output Diversity in Large Language Models Paper • 2601.11227 • Published Jan 16 • 9
More Images, More Problems? A Controlled Analysis of VLM Failure Modes Paper • 2601.07812 • Published Jan 12 • 6
Toward Efficient Agents: Memory, Tool learning, and Planning Paper • 2601.14192 • Published Jan 20 • 57
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published Jan 22 • 55
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models Paper • 2601.15224 • Published Jan 21 • 12
360Anything: Geometry-Free Lifting of Images and Videos to 360° Paper • 2601.16192 • Published Jan 22 • 9
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences Paper • 2601.07251 • Published Jan 12 • 11
KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices Paper • 2601.21579 • Published Jan 29 • 6
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 111
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning Paper • 2601.21468 • Published Jan 29 • 25
Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis Paper • 2602.03139 • Published Feb 3 • 44
Protein Autoregressive Modeling via Multiscale Structure Generation Paper • 2602.04883 • Published Feb 4 • 3
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration Paper • 2602.01734 • Published Feb 2 • 32
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger Paper • 2602.08222 • Published Feb 9 • 290
Reliable and Responsible Foundation Models: A Comprehensive Survey Paper • 2602.08145 • Published Feb 4 • 8
Col-Bandit: Zero-Shot Query-Time Pruning for Late-Interaction Retrieval Paper • 2602.02827 • Published Feb 2 • 3
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context Paper • 2602.12108 • Published Feb 12 • 13
When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models Paper • 2602.10179 • Published Feb 10 • 6
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics Paper • 2602.12617 • Published Feb 13 • 20
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks Paper • 2602.14689 • Published Feb 16 • 1
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers Paper • 2602.15322 • Published Feb 17 • 11
Visual Persuasion: What Influences Decisions of Vision-Language Models? Paper • 2602.15278 • Published Feb 17 • 3
The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems Paper • 2602.15382 • Published Feb 17 • 5
Causal-JEPA: Learning World Models through Object-Level Latent Interventions Paper • 2602.11389 • Published Feb 11 • 9
Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality Paper • 2602.14080 • Published Feb 15 • 21
Multi-agent cooperation through in-context co-player inference Paper • 2602.16301 • Published Feb 18 • 24
DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers Paper • 2602.16968 • Published Feb 19 • 12
CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing Paper • 2602.15823 • Published Feb 17 • 3
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published Feb 9 • 264
Spanning the Visual Analogy Space with a Weight Basis of LoRAs Paper • 2602.15727 • Published Feb 17 • 14
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published Feb 24 • 102
Test-Time Training with KV Binding Is Secretly Linear Attention Paper • 2602.21204 • Published Feb 24 • 32
Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking Paper • 2602.21196 • Published Feb 24 • 7
The Diffusion Duality, Chapter II: Ψ-Samplers and Efficient Curriculum Paper • 2602.21185 • Published Feb 24 • 4
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference Paper • 2602.21548 • Published Feb 25 • 52
From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors Paper • 2602.21778 • Published Feb 25 • 14
SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models Paper • 2602.18993 • Published Feb 22 • 4
Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting Paper • 2602.20933 • Published Feb 24 • 4
Causal Motion Diffusion Models for Autoregressive Motion Generation Paper • 2602.22594 • Published Feb 26 • 7
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning Paper • 2602.23258 • Published Feb 26 • 28
Mode Seeking meets Mean Seeking for Fast Long Video Generation Paper • 2602.24289 • Published Feb 27 • 41
LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding Paper • 2602.23881 • Published Feb 27 • 18
How to Take a Memorable Picture? Empowering Users with Actionable Feedback Paper • 2602.21877 • Published Feb 25 • 16
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published Mar 3 • 104
RealWonder: Real-Time Physical Action-Conditioned Video Generation Paper • 2603.05449 • Published Mar 5 • 12
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling Paper • 2603.04553 • Published Mar 4 • 3
Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey Paper • 2603.04445 • Published 24 days ago • 5
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published Mar 10 • 53
Lost in Backpropagation: The LM Head is a Gradient Bottleneck Paper • 2603.10145 • Published Mar 10 • 13
Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers Paper • 2603.10744 • Published Mar 11 • 7
HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios Paper • 2603.11975 • Published Mar 12 • 11
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published Mar 16 • 153
WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation Paper • 2603.15132 • Published Mar 16 • 35
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning Paper • 2603.14482 • Published Mar 15 • 34
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents Paper • 2603.18815 • Published Mar 19 • 14
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Paper • 2603.25745 • Published Mar 26 • 16
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation Paper • 2603.25702 • Published Mar 26 • 8
Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models Paper • 2603.24844 • Published Mar 25 • 10
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published Mar 24 • 62
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents Paper • 2603.22386 • Published Mar 23 • 57
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD Paper • 2603.20155 • Published Mar 20 • 10
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 289
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering Paper • 2604.08224 • Published Apr 9 • 51
Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills Paper • 2604.05333 • Published Apr 7 • 22
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment Paper • 2604.06377 • Published Apr 7 • 7
Combee: Scaling Prompt Learning for Self-Improving Language Model Agents Paper • 2604.04247 • Published Apr 5 • 31
TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders Paper • 2604.07340 • Published Apr 8 • 17
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 501
FileGram: Grounding Agent Personalization in File-System Behavioral Traces Paper • 2604.04901 • Published Apr 6 • 40
Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems Paper • 2604.03295 • Published Mar 27 • 10
HDP: A Lightweight Cryptographic Protocol for Human Delegation Provenance in Agentic AI Systems Paper • 2604.04522 • Published Apr 6 • 10
Do World Action Models Generalize Better than VLAs? A Robustness Study Paper • 2603.22078 • Published Apr 1 • 7
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published Apr 2 • 151
FlowSlider: Training-Free Continuous Image Editing via Fidelity-Steering Decomposition Paper • 2604.02088 • Published Apr 2 • 6
Signals: Trajectory Sampling and Triage for Agentic Interactions Paper • 2604.00356 • Published Apr 1 • 8
ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers Paper • 2603.24414 • Published Mar 25 • 183
Consistency Amplifies: How Behavioral Variance Shapes Agent Accuracy Paper • 2603.25764 • Published Mar 26 • 5
Semantic Richness or Geometric Reasoning? The Fragility of VLM's Visual Invariance Paper • 2604.01848 • Published Apr 3 • 5
EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers Paper • 2604.09130 • Published Apr 10 • 4
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details Paper • 2604.06870 • Published Apr 8 • 41
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music Paper • 2604.10905 • Published Apr 13 • 28
From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models Paper • 2604.09459 • Published Apr 13 • 13
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2604.12374 • Published Apr 14 • 36
Accelerating Speculative Decoding with Block Diffusion Draft Trees Paper • 2604.12989 • Published Apr 14 • 8
LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling Paper • 2604.11748 • Published about 1 month ago • 14
Elucidating the SNR-t Bias of Diffusion Probabilistic Models Paper • 2604.16044 • Published 28 days ago • 74
Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips Paper • 2502.07408 • Published 29 days ago • 59
Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language Paper • 2604.19667 • Published 24 days ago • 22
CityRAG: Stepping Into a City via Spatially-Grounded Video Generation Paper • 2604.19741 • Published 24 days ago • 17
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Paper • 2604.13602 • Published about 1 month ago • 32
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 18 days ago • 70
Efficient Training on Multiple Consumer GPUs with RoundPipe Paper • 2604.27085 • Published 16 days ago • 40
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence Paper • 2604.24954 • Published 18 days ago • 22
Synthetic Computers at Scale for Long-Horizon Productivity Simulation Paper • 2604.28181 • Published 15 days ago • 18
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization Paper • 2604.24952 • Published 18 days ago • 6
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills Paper • 2604.24026 • Published 18 days ago • 21
Leveraging Verifier-Based Reinforcement Learning in Image Editing Paper • 2604.27505 • Published 15 days ago • 57