-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 14 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 57 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63
Collections
Discover the best community collections!
Collections including paper arxiv:2305.18290
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117
-
Attention Is All You Need
Paper • 1706.03762 • Published • 105 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 54 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 138
-
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
Paper • 2509.22576 • Published • 134 -
AgentBench: Evaluating LLMs as Agents
Paper • 2308.03688 • Published • 25 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
A General Theoretical Paradigm to Understand Learning from Human Preferences
Paper • 2310.12036 • Published • 17 -
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 14
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 14 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 57 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 138
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 144 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 117
-
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
Paper • 2509.22576 • Published • 134 -
AgentBench: Evaluating LLMs as Agents
Paper • 2308.03688 • Published • 25 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63
-
Attention Is All You Need
Paper • 1706.03762 • Published • 105 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 54 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 24 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 63 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
A General Theoretical Paradigm to Understand Learning from Human Preferences
Paper • 2310.12036 • Published • 17 -
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 14