Update README.md
Browse files
README.md
CHANGED
|
@@ -13,8 +13,7 @@ language:
|
|
| 13 |
|
| 14 |
# ReasoningCore-Llama-3B-R1-aligned
|
| 15 |
|
| 16 |
-
**ReasoningCore-Llama-3B-R1-aligned** is a multilingual, reasoning‑enhanced large language model developed by
|
| 17 |
-
|
| 18 |
### We used GRPO technique:
|
| 19 |
|
| 20 |
To provide a comprehensive overview of Group Relative Policy Optimization (GRPO), a post-training technique for Large Language Models (LLMs), and its application in the DeepSeek-R1 model.
|
|
|
|
| 13 |
|
| 14 |
# ReasoningCore-Llama-3B-R1-aligned
|
| 15 |
|
| 16 |
+
**ReasoningCore-Llama-3B-R1-aligned** is a multilingual, reasoning‑enhanced large language model developed by EpistemeAI. It is supervised fine tuning with alignment and safety dataset to steer to safety response.
|
|
|
|
| 17 |
### We used GRPO technique:
|
| 18 |
|
| 19 |
To provide a comprehensive overview of Group Relative Policy Optimization (GRPO), a post-training technique for Large Language Models (LLMs), and its application in the DeepSeek-R1 model.
|