4 426

M Saad Salman

MSS444

MSS444

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

When Does Sparsity Mitigate the Curse of Depth in LLMs

upvoted a paper 2 days ago

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

upvoted a paper 2 days ago

Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

View all activity

Organizations

None yet

upvoted 10 papers 2 days ago

When Does Sparsity Mitigate the Curse of Depth in LLMs

Paper • 2603.15389 • Published 3 days ago • 5

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

Paper • 2603.15611 • Published 3 days ago • 10

Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

Paper • 2603.13985 • Published 5 days ago • 9

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

Paper • 2603.15500 • Published 3 days ago • 11

TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

Paper • 2603.12529 • Published 7 days ago • 18

EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings

Paper • 2603.13594 • Published 6 days ago • 138

Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context

Paper • 2603.15653 • Published 13 days ago • 4

upvoted 8 papers 3 days ago

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

Paper • 2603.08561 • Published 10 days ago • 12

Lost in Backpropagation: The LM Head is a Gradient Bottleneck

Paper • 2603.10145 • Published 9 days ago • 11

ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning

Paper • 2603.10160 • Published 9 days ago • 25

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published 9 days ago • 130

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published 15 days ago • 204

Automatic Generation of High-Performance RL Environments

Paper • 2603.12145 • Published 7 days ago • 6

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Paper • 2603.10444 • Published 9 days ago • 10

CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

Paper • 2603.11863 • Published 8 days ago • 6

upvoted 2 papers 7 days ago

Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining

Paper • 2603.11103 • Published 9 days ago • 8

Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training

Paper • 2603.12246 • Published 7 days ago • 4

M Saad Salman

AI & ML interests

Recent Activity

Organizations

MSS444's activity