1 22 5

zuijiang

AI & ML interests

None yet

Recent Activity

liked a model 9 days ago

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

upvoted a paper 9 days ago

Composer 2 Technical Report

upvoted a paper 13 days ago

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

View all activity

Organizations

upvoted a paper 9 days ago

Composer 2 Technical Report

Paper • 2603.24477 • Published 14 days ago • 15

upvoted a paper 13 days ago

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Paper • 2603.24472 • Published 14 days ago • 51

upvoted a paper 19 days ago

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Paper • 2603.19220 • Published 20 days ago • 66

upvoted a paper 20 days ago

Complementary Reinforcement Learning

Paper • 2603.17621 • Published 21 days ago • 36

upvoted a paper 27 days ago

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published 29 days ago • 150

upvoted a paper about 1 month ago

veScale-FSDP: Flexible and High-Performance FSDP at Scale

Paper • 2602.22437 • Published Feb 25 • 7

upvoted 3 papers 2 months ago

CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs

Paper • 2602.03048 • Published Feb 3 • 32

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Paper • 2602.01058 • Published Feb 1 • 43

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 110

upvoted 2 papers 3 months ago

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Paper • 2601.00747 • Published Jan 2 • 20

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Paper • 2512.23959 • Published Dec 30, 2025 • 111

upvoted 2 papers 4 months ago

Coupled Variational Reinforcement Learning for Language Model General Reasoning

Paper • 2512.12576 • Published Dec 14, 2025 • 3

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 265

upvoted a paper 7 months ago

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2, 2025 • 84

upvoted 2 papers 10 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265

GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks

Paper • 2504.12764 • Published Apr 17, 2025 • 42

upvoted a paper 12 months ago

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85

upvoted a paper about 1 year ago

DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

Paper • 2502.01142 • Published Feb 3, 2025 • 25

upvoted 2 papers over 1 year ago

Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models

Paper • 2501.01830 • Published Jan 3, 2025 • 17

Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering

Paper • 2411.11504 • Published Nov 18, 2024 • 24

zuijiang

AI & ML interests

Recent Activity

Organizations

zuijiang's activity