Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

reinforcement-learning

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

70,261

Full-text search

Active filters: reinforcement-learning

nvidia/NitroGen

Reinforcement Learning • Updated 8 days ago • 497

Adilbai/stock-trading-rl-agent

Reinforcement Learning • Updated Jan 8 • 110 • 104

hkust-nlp/drkernel-14b

Text Generation • 15B • Updated 7 days ago • 25 • 4

Snowflake/Arctic-AWM-14B

Reinforcement Learning • 15B • Updated 2 days ago • 27 • 4

lightx2v/Wan2.1-T2V-1.3B-longcat-step1500

Text-to-Video • Updated 2 days ago • 27 • 4

LightningRodLabs/future-as-label-paper-step160

Reinforcement Learning • 33B • Updated 28 days ago • 27 • 3

ChengyuDu0123/HER-32B

Text Generation • Updated 9 days ago • 95 • 14

Snowflake/Arctic-AWM-4B

Reinforcement Learning • 4B • Updated 2 days ago • 26 • 3

Snowflake/Arctic-AWM-8B

Reinforcement Learning • 8B • Updated 2 days ago • 20 • 3

lightx2v/Wan2.1-T2V-1.3B-longcat-step500

Text-to-Video • Updated 2 days ago • 21 • 3

lightx2v/Wan2.1-T2V-1.3B-longcat-step1000

Text-to-Video • Updated 2 days ago • 9 • 3

ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q8

Reinforcement Learning • 8B • Updated Mar 28, 2025 • 4.49k • 193

JonusNattapong/AI-XAUUSD-Trading

Reinforcement Learning • Updated Oct 10, 2025 • 17

shiviktech/Trident

Text Generation • 4B • Updated Jan 7 • 77 • 2

zai-org/GLM-TTS

Text-to-Speech • Updated Jan 12 • 381 • 314

MING-ZCH/MetaphorStar-32B

Image-Text-to-Text • 33B • Updated about 4 hours ago • 22 • 2

dayll/SEAD-14B

Text Generation • 15B • Updated 4 days ago • 24 • 3

hkust-nlp/drkernel-8b

Text Generation • 8B • Updated 7 days ago • 107 • 2

ParamTatva/sanskrit-ppo-hopper-v5

Reinforcement Learning • Updated 5 days ago • 2

sb3/ppo-BipedalWalkerHardcore-v3

Reinforcement Learning • Updated Oct 11, 2022 • 13 • 2

sb3/tqc-Walker2DBulletEnv-v0

Reinforcement Learning • Updated Oct 11, 2022 • 4 • 1

danieladejumo/ppo-MountainCarContinuous-v0

Reinforcement Learning • Updated Jun 30, 2022 • 6 • 1

qgallouedec/ppo-LiftCube-v0

Robotics • Updated Jun 10, 2024 • 30 • 1

Makrrr/Qwen3-1.7B-GSM8K-GRPO-verl

Reinforcement Learning • 2B • Updated Jul 5, 2025 • 29 • 3

infly/inf-retriever-v1-pro

Reinforcement Learning • 7B • Updated 11 days ago • 426 • 5

infly/inf-query-aligner

Reinforcement Learning • 8B • Updated Jan 5 • 297 • 7

PrimeIntellect/INTELLECT-3

Text Generation • Updated Nov 27, 2025 • 1.24k • 207

dariakryvosheieva/video-prompt-enhancer

Reinforcement Learning • Updated Dec 10, 2025 • 10 • 2

mradermacher/inf-query-aligner-GGUF

Reinforcement Learning • 8B • Updated Dec 18, 2025 • 1.72k • 2

Maincode/Maincoder-1B

Text Generation • 1B • Updated Jan 5 • 287 • 88