OrionStarAI
/

Orion-14B-Chat-Int4

@@ -11,9 +11,7 @@ pipeline_tag: text-generation
 <!-- markdownlint-disable first-line-h1 -->
 <!-- markdownlint-disable html -->
-<div align="center">
-  <img src="./assets/imgs/orion_start.PNG" alt="logo" width="50%" />
-</div>
 <div align="center">
 <h1>
@@ -28,7 +26,7 @@ pipeline_tag: text-generation
     <p>
         <b>🌐English</b> |
         <a href="https://huggingface.co/OrionStarAI/Orion-14B-Chat-Int4/blob/main/README_zh.md">🇨🇳中文</a><br><br>
-        🤗 <a href="https://huggingface.co/OrionStarAI" target="_blank">HuggingFace Mainpage</a> | 🤖 <a href="https://modelscope.cn/organization/OrionStarAI" target="_blank">ModelScope Mainpage</a><br>🎬 <a href="https://huggingface.co/spaces/OrionStarAI/Orion-14B-App-Demo" target="_blank">HuggingFace Demo</a> | 🎫 <a href="https://modelscope.cn/studios/OrionStarAI/Orion-14B-App-Demo/summary" target="_blank">ModelScope Demo</a><br>📖 <a href="https://github.com/OrionStarAI/Orion/blob/master/doc/Orion14B_v3.pdf" target="_blank">Tech Report</a>
     <p>
 </h4>
@@ -42,12 +40,14 @@ pipeline_tag: text-generation
 - [🔗 Model Download](#model-download)
 - [🔖 Model Benchmark](#model-benchmark)
 - [📊 Model Inference](#model-inference)
-- [📜 Declarations & License](#declarations-license)
 - [🥇 Company Introduction](#company-introduction)
 # 1. Model Introduction
-- Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI.  The base model is trained on 2.5T multilingual corpus, including Chinese, English, Japanese, Korean, etc, and it exhibits superior performance in these languages.  For details, please refer to [tech report](https://github.com/OrionStarAI/Orion/blob/master/doc/Orion14B_v3.pdf).
 - The Orion-14B series models exhibit the following features:
   - Among models with 20B-parameter scale level, Orion-14B-Base model shows outstanding performance in comprehensive evaluations.
@@ -174,8 +174,7 @@ Model release and download links are provided in the table below:
 | Llama2-13B-Chat    |  3.05  |  3.79  |  5.43  |  4.40  |  6.76  |  6.63  |  6.99  |  5.65  |  4.70  |
 | InternLM-20B-Chat  |  3.39  |  3.92  |  5.96  |  5.50  |**7.18**|  6.19  |  6.49  |  6.22  |  4.96  |
 | **Orion-14B-Chat** |  4.00  |  4.24  |  6.18  |**6.57**|  7.16  |**7.36**|**7.16**|**6.99**|  5.51  |
- \* use vllm for inference
 ## 3.3. LongChat Model Orion-14B-LongChat Benchmarks
 ### 3.3.1. LongChat evaluation of LongBench

 <!-- markdownlint-disable first-line-h1 -->
 <!-- markdownlint-disable html -->
+![](./assets/imgs/orion_start.PNG)
 <div align="center">
 <h1>
     <p>
         <b>🌐English</b> |
         <a href="https://huggingface.co/OrionStarAI/Orion-14B-Chat-Int4/blob/main/README_zh.md">🇨🇳中文</a><br><br>
+        🤗 <a href="https://huggingface.co/OrionStarAI" target="_blank">HuggingFace Mainpage</a> | 🤖 <a href="https://modelscope.cn/organization/OrionStarAI" target="_blank">ModelScope Mainpage</a><br>🎬 <a href="https://huggingface.co/spaces/OrionStarAI/Orion-14B-App-Demo" target="_blank">HuggingFace Demo</a> | 🎫 <a href="https://modelscope.cn/studios/OrionStarAI/Orion-14B-App-Demo/summary" target="_blank">ModelScope Demo</a>
     <p>
 </h4>
 - [🔗 Model Download](#model-download)
 - [🔖 Model Benchmark](#model-benchmark)
 - [📊 Model Inference](#model-inference)
 - [🥇 Company Introduction](#company-introduction)
+- [📜 Declarations & License](#declarations-license)
 # 1. Model Introduction
+- Orion-14B-Chat is fine-tuned from Orion-14B-Base using a high-quality corpus of approximately 850,000 entries (only sft), and it also supports Chinese, English, Japanese, and Korean. It performs exceptionally well on the MT-Bench and AlignBench evaluation sets, significantly surpassing other models of the same parameter scale in multiple metrics. For details, please refer to [tech report](https://github.com/OrionStarAI/Orion/blob/master/doc/Orion14B_v3.pdf).
+- The 850,000 fine-tuning corpus comprises two parts: approximately 220,000 manually curated high-quality datasets and 630,000 entries selected and semantically deduplicated from open-source data through model filtering. Among these, the Japanese and Korean data, totaling 70,000 entries, have only undergone basic cleaning and deduplication.
 - The Orion-14B series models exhibit the following features:
   - Among models with 20B-parameter scale level, Orion-14B-Base model shows outstanding performance in comprehensive evaluations.
 | Llama2-13B-Chat    |  3.05  |  3.79  |  5.43  |  4.40  |  6.76  |  6.63  |  6.99  |  5.65  |  4.70  |
 | InternLM-20B-Chat  |  3.39  |  3.92  |  5.96  |  5.50  |**7.18**|  6.19  |  6.49  |  6.22  |  4.96  |
 | **Orion-14B-Chat** |  4.00  |  4.24  |  6.18  |**6.57**|  7.16  |**7.36**|**7.16**|**6.99**|  5.51  |
+\* use vllm for inference
 ## 3.3. LongChat Model Orion-14B-LongChat Benchmarks
 ### 3.3.1. LongChat evaluation of LongBench