[BOREA model card]

image/png

[Model Information]

Based on phi-3.5-mini-Instruct, this model is a general-purpose model with improved performance from the base model after employing multiple tuning methods. In particular, Japanese language performance has been improved.

phi-3.5-mini-Instructใ‚’ใƒ™ใƒผใ‚นใจใ—ใฆใ€่ค‡ๆ•ฐใฎใƒใƒฅใƒผใƒ‹ใƒณใ‚ฐๆ‰‹ๆณ•ใ‚’ๆŽก็”จใฎใ†ใˆใ€ๆฑŽ็”จ็š„ใซใƒ™ใƒผใ‚นใƒขใƒ‡ใƒซใ‹ใ‚‰ๆ€ง่ƒฝใ‚’ๅ‘ไธŠใ•ใ›ใŸใƒขใƒ‡ใƒซใงใ™ใ€‚็‰นใซๆ—ฅๆœฌ่ชžๆ€ง่ƒฝใŒๅ‘ไธŠใ—ใฆใ„ใพใ™ใ€‚

[Benchmark Results]

image/png

TODO:

ๆŽจๅฅจใ•ใ‚Œใ‚‹ไฝฟ็”จใ‚ฌใ‚คใƒ‰ใƒฉใ‚คใƒณ / Recommended Usage Guidelines

  1. ๅ•†็”จๅˆฉ็”จ: ๆœฌใƒขใƒ‡ใƒซใ‚’ๅ•†็”จ็›ฎ็š„ใงไฝฟ็”จใ™ใ‚‹ๅ ดๅˆใ€info@axcxept.com ใธใฎใƒกใƒผใƒซ้€ฃ็ตกใ‚’ๅผทใๆŽจๅฅจใ—ใพใ™ใ€‚ใ“ใ‚Œใซใ‚ˆใ‚Šใ€ใƒขใƒ‡ใƒซใฎๅฟœ็”จใ‚„ๆ”นๅ–„ใซใคใ„ใฆใฎๅ”ๅŠ›ใฎๆฉŸไผšใŒ็”Ÿใพใ‚Œใ‚‹ๅฏ่ƒฝๆ€งใŒใ‚ใ‚Šใพใ™ใ€‚

  2. ใ‚ฏใƒฌใ‚ธใƒƒใƒˆ่กจ่จ˜: ๆœฌใƒขใƒ‡ใƒซใ‚’ไฝฟ็”จใพใŸใฏๆ”นๅค‰ใ™ใ‚‹้š›ใฏใ€ไปฅไธ‹ใฎใ‚ˆใ†ใชใ‚ฏใƒฌใ‚ธใƒƒใƒˆ่กจ่จ˜ใ‚’่กŒใ†ใ“ใจใ‚’ๆŽจๅฅจใ—ใพใ™๏ผš "This project utilizes HODACHI/Borea-Phi-3.5-mini-Instruct-Jp, a model based on Phi-3.5-mini-Instruct and fine-tuned by Axcxept co., ltd."

  3. ใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏ: ใƒขใƒ‡ใƒซใฎไฝฟ็”จ็ตŒ้จ“ใซ้–ขใ™ใ‚‹ใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏใ‚’ๆญ“่ฟŽใ—ใพใ™ใ€‚info@axcxept.com ใพใงใ”้€ฃ็ตกใใ ใ•ใ„ใ€‚

ใ“ใ‚Œใ‚‰ใฏๆŽจๅฅจไบ‹้ …ใงใ‚ใ‚Šใ€ๆณ•็š„่ฆไปถใงใฏใ‚ใ‚Šใพใ›ใ‚“ใ€‚

  1. Commercial Use: If you plan to use this model for commercial purposes, we strongly encourage you to inform us via email at info@axcxept.com. This allows for potential collaboration on model applications and improvements.

  2. Attribution: When using or adapting this model, we recommend providing attribution as follows: "This project utilizes HODACHI/Borea-Phi-3.5-mini-Instruct-Jp, a model based on Phi-3.5-mini-Instruct and fine-tuned by Axcxept co., ltd."

  3. Feedback: We welcome any feedback on your experience with the model. Please feel free to email us at info@axcxept.com.

Please note that these are recommendations and not legal requirements.

[Usage]

Here are some code snippets to quickly get started with the model. First, run:

pip install flash_attn==2.5.8
pip install accelerate==0.31.0
pip install transformers==4.43.0
pip install -U trl
pip install pytest

Then, copy the snippet from the relevant section for your use case.

ไปฅไธ‹ใซใ€ใƒขใƒ‡ใƒซใฎๅฎŸ่กŒใ‚’็ด ๆ—ฉใ้–‹ๅง‹ใ™ใ‚‹ใŸใ‚ใฎใ‚ณใƒผใƒ‰ใ‚นใƒ‹ใƒšใƒƒใƒˆใ‚’ใ„ใใคใ‹็ดนไป‹ใ—ใพใ™ใ€‚ ใพใšใ€

pip install flash_attn==2.5.8
pip install accelerate==0.31.0
pip install transformers==4.43.0
pip install -U trl
pip install pytest

ใ‚’ๅฎŸ่กŒใ—ใ€ไฝฟ็”จไพ‹ใซ้–ข้€ฃใ™ใ‚‹ใ‚ปใ‚ฏใ‚ทใƒงใƒณใฎใ‚นใƒ‹ใƒšใƒƒใƒˆใ‚’ใ‚ณใƒ”ใƒผใ—ใฆใใ ใ•ใ„ใ€‚

[Chat Template]

<|system|>
ใ‚ใชใŸใฏๆ—ฅๆœฌ่ชž่ƒฝๅŠ›ใŒ้ซ˜ใ„้ซ˜ๅบฆใชAIใงใ™ใ€‚็‰นๅˆฅใชๆŒ‡็คบใŒใชใ„้™ใ‚Šๆ—ฅๆœฌ่ชžใง่ฟ”็ญ”ใ—ใฆใใ ใ•ใ„ใ€‚<|end|>
<|user|>
ใ€Œ็”Ÿใ็‰ฉใƒ‡ใ‚ถใ‚คใƒŠใƒผใ€ใจใ„ใ†่ทๆฅญใŒใ‚ใ‚Šใพใ™ใ€‚ใ“ใ‚Œใฏใ€่‡ชๅˆ†ใŒ่€ƒใˆใŸใ‚ชใƒชใ‚ธใƒŠใƒซใฎ็”Ÿใ็‰ฉใ‚’ใƒ‡ใ‚ถใ‚คใƒณใ—ใ€ๅฎŸ้š›ใซDNAใ‚’็ทจ้›†ใ—ใฆไฝœใ‚Šๅ‡บใ™ไป•ไบ‹ใงใ™ใ€‚ใ‚ใชใŸใŒ็”Ÿใ็‰ฉใƒ‡ใ‚ถใ‚คใƒŠใƒผใงใ‚ใ‚‹ๅ ดๅˆใ€ใฉใ‚“ใช็”Ÿใ็‰ฉใ‚’ไฝœใ‚ŠใŸใ„ใงใ™ใ‹๏ผŸใพใŸใ€ใใฎ็”Ÿใ็‰ฉใŒๆŒใค็‰นๅพดใ‚„่ƒฝๅŠ›ใซใคใ„ใฆ่ชฌๆ˜Žใ—ใฆใใ ใ•ใ„ใ€‚
<|end|>
<|assistant|>

Loading the model locally

After obtaining the Phi-3.5-mini-instruct model checkpoint, users can use this sample code for inference.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)

model = AutoModelForCausalLM.from_pretrained(
    "HODACHI/Borea-Phi-3.5-mini-Instruct-Jp", 
    device_map="cuda", 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
tokenizer = AutoTokenizer.from_pretrained("HODACHI/Borea-Phi-3.5-mini-Instruct-Jp")

messages = [
    {"role": "system", "content": "ใ‚ใชใŸใฏๆ—ฅๆœฌ่ชž่ƒฝๅŠ›ใŒ้ซ˜ใ„้ซ˜ๅบฆใชAIใงใ™ใ€‚็‰นๅˆฅใชๆŒ‡็คบใŒใชใ„้™ใ‚Šๆ—ฅๆœฌ่ชžใง่ฟ”็ญ”ใ—ใฆใใ ใ•ใ„ใ€‚"},
    {"role": "user", "content": "ใ€Œ็”Ÿใ็‰ฉใƒ‡ใ‚ถใ‚คใƒŠใƒผใ€ใจใ„ใ†่ทๆฅญใŒใ‚ใ‚Šใพใ™ใ€‚ใ“ใ‚Œใฏใ€่‡ชๅˆ†ใŒ่€ƒใˆใŸใ‚ชใƒชใ‚ธใƒŠใƒซใฎ็”Ÿใ็‰ฉใ‚’ใƒ‡ใ‚ถใ‚คใƒณใ—ใ€ๅฎŸ้š›ใซDNAใ‚’็ทจ้›†ใ—ใฆไฝœใ‚Šๅ‡บใ™ไป•ไบ‹ใงใ™ใ€‚ใ‚ใชใŸใŒ็”Ÿใ็‰ฉใƒ‡ใ‚ถใ‚คใƒŠใƒผใงใ‚ใ‚‹ๅ ดๅˆใ€ใฉใ‚“ใช็”Ÿใ็‰ฉใ‚’ไฝœใ‚ŠใŸใ„ใงใ™ใ‹๏ผŸใพใŸใ€ใใฎ็”Ÿใ็‰ฉใŒๆŒใค็‰นๅพดใ‚„่ƒฝๅŠ›ใซใคใ„ใฆ่ชฌๆ˜Žใ—ใฆใใ ใ•ใ„ใ€‚"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 1024,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

Notes: If you want to use flash attention, call AutoModelForCausalLM.from_pretrained() with attn_implementation="flash_attention_2"

[Model Data]

Training Dataset]

We extracted high-quality data from Japanese Wikipedia and FineWeb to create instruction data. Our innovative training approach allows for performance improvements across various languages and domains, making the model suitable for global use despite its focus on Japanese data.

ๆ—ฅๆœฌ่ชžใฎWikiใƒ‡ใƒผใ‚ฟใŠใ‚ˆใณใ€FineWebใ‹ใ‚‰่‰ฏ่ณชใชใƒ‡ใƒผใ‚ฟใฎใฟใ‚’ๆŠฝๅ‡บใ—ใ€Instructionใƒ‡ใƒผใ‚ฟใ‚’ไฝœๆˆใ—ใพใ—ใŸใ€‚ใ“ใฎใƒขใƒ‡ใƒซใงใฏๆ—ฅๆœฌ่ชžใซ็‰นๅŒ–ใ•ใ›ใฆใ„ใพใ™ใŒใ€ไธ–็•Œไธญใฎใฉใ‚“ใชใƒฆใƒผใ‚นใ‚ฑใƒผใ‚นใงใ‚‚ๅˆฉ็”จๅฏ่ƒฝใชใ‚ขใƒ—ใƒญใƒผใƒใงใ™ใ€‚

https://huggingface.co/datasets/legacy-datasets/wikipedia https://huggingface.co/datasets/HuggingFaceFW/fineweb

Data Preprocessing

We used a plain instruction tuning method to train the model on exemplary responses. This approach enhances the model's ability to understand and generate high-quality responses across various languages and contexts.

ใƒ—ใƒฌใ‚คใƒณใ‚นใƒˆใƒฉใ‚ฏใƒˆใƒใƒฅใƒผใƒ‹ใƒณใ‚ฐๆ‰‹ๆณ•ใ‚’็”จใ„ใฆใ€ๆจก็ฏ„็š„ๅ›ž็ญ”ใ‚’ๅญฆ็ฟ’ใ•ใ›ใพใ—ใŸใ€‚ใ“ใฎๆ‰‹ๆณ•ใซใ‚ˆใ‚Šใ€ใƒขใƒ‡ใƒซใฏๆง˜ใ€…ใช่จ€่ชžใ‚„ใ‚ณใƒณใƒ†ใ‚ญใ‚นใƒˆใซใŠใ„ใฆ้ซ˜ๅ“่ณชใชๅฟœ็ญ”ใ‚’็†่งฃใ—็”Ÿๆˆใ™ใ‚‹่ƒฝๅŠ›ใŒๅ‘ไธŠใ—ใฆใ„ใพใ™ใ€‚

Implementation Information

[Pre-Instruction Training]

https://huggingface.co/instruction-pretrain/instruction-synthesizer

[Disclaimer]

ใ“ใฎใƒขใƒ‡ใƒซใฏ็ ”็ฉถ้–‹็™บใฎใฟใ‚’็›ฎ็š„ใจใ—ใฆๆไพ›ใ•ใ‚Œใ‚‹ใ‚‚ใฎใงใ‚ใ‚Šใ€ๅฎŸ้จ“็š„ใชใƒ—ใƒญใƒˆใ‚ฟใ‚คใƒ—ใจใฟใชใ•ใ‚Œใ‚‹ในใใƒขใƒ‡ใƒซใงใ™ใ€‚ ๅ•†ๆฅญ็š„ใชไฝฟ็”จใ‚„ใƒŸใƒƒใ‚ทใƒงใƒณใ‚ฏใƒชใƒ†ใ‚ฃใ‚ซใƒซใช็’ฐๅขƒใธใฎ้…ๅ‚™ใ‚’ๆ„ๅ›ณใ—ใŸใ‚‚ใฎใงใฏใ‚ใ‚Šใพใ›ใ‚“ใ€‚ ๆœฌใƒขใƒ‡ใƒซใฎไฝฟ็”จใฏใ€ไฝฟ็”จ่€…ใฎ่ฒฌไปปใซใŠใ„ใฆ่กŒใ‚ใ‚Œใ‚‹ใ‚‚ใฎใจใ—ใ€ใใฎๆ€ง่ƒฝใŠใ‚ˆใณ็ตๆžœใฏไฟ่จผใ•ใ‚Œใพใ›ใ‚“ใ€‚ Axcxeptๆ ชๅผไผš็คพใฏใ€็›ดๆŽฅ็š„ใ€้–“ๆŽฅ็š„ใ€็‰นๅˆฅใ€ๅถ็™บ็š„ใ€็ตๆžœ็š„ใชๆๅฎณใ€ใพใŸใฏๆœฌใƒขใƒ‡ใƒซใฎไฝฟ็”จใ‹ใ‚‰็”Ÿใ˜ใ‚‹ใ„ใ‹ใชใ‚‹ๆๅคฑใซๅฏพใ—ใฆใ‚‚ใ€ๅพ—ใ‚‰ใ‚ŒใŸ็ตๆžœใซใ‹ใ‹ใ‚ใ‚‰ใšใ€ไธ€ๅˆ‡ใฎ่ฒฌไปปใ‚’่ฒ ใ„ใพใ›ใ‚“ใ€‚ ๅˆฉ็”จ่€…ใฏใ€ๆœฌใƒขใƒ‡ใƒซใฎไฝฟ็”จใซไผดใ†ใƒชใ‚นใ‚ฏใ‚’ๅๅˆ†ใซ็†่งฃใ—ใ€่‡ชๅทฑใฎๅˆคๆ–ญใงไฝฟ็”จใ™ใ‚‹ใ‚‚ใฎใจใ—ใพใ™ใ€‚

[Hardware]

H100PCIe ร— 8(Running in 2h)

[We are.]

Axcxept logo

Downloads last month
141
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AXCXEPT/Borea-Phi-3.5-mini-Instruct-Jp

Quantizations
2 models

Space using AXCXEPT/Borea-Phi-3.5-mini-Instruct-Jp 1

Collection including AXCXEPT/Borea-Phi-3.5-mini-Instruct-Jp