PEFT
Safetensors
krx

Qwen 2.5 7B Instruct ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹

์ด ์ €์žฅ์†Œ๋Š” Amazon SageMaker๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Qwen 2.5 7B Instruct ๋ชจ๋ธ์„ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ํ”„๋กœ์ ํŠธ๋Š” ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ํšจ์œจ์ ์ธ ํŒŒ์ธํŠœ๋‹์„ ์œ„ํ•ด QLoRA(Quantized Low-Rank Adaptation)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

.
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ train.py
โ”‚   โ”œโ”€โ”€ tokenization_qwen2.py
โ”‚   โ”œโ”€โ”€ requirements.txt
โ”‚   โ””โ”€โ”€ bootstrap.sh
โ”œโ”€โ”€ sagemaker_train.py
โ””โ”€โ”€ README.md

์‚ฌ์ „ ์š”๊ตฌ์‚ฌํ•ญ

  • Amazon SageMaker ์ ‘๊ทผ ๊ถŒํ•œ
  • Hugging Face ๊ณ„์ • ๋ฐ ์ ‘๊ทผ ํ† ํฐ
  • AWS ์ž๊ฒฉ ์ฆ๋ช… ๊ตฌ์„ฑ
  • Python 3.10+

ํ™˜๊ฒฝ ์„ค์ •

ํ”„๋กœ์ ํŠธ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ์ฃผ์š” ์˜์กด์„ฑ:

  • PyTorch 2.1.0
  • Transformers (main ๋ธŒ๋žœ์น˜์˜ ์ตœ์‹  ๋ฒ„์ „)
  • Accelerate >= 0.27.0
  • PEFT >= 0.6.0
  • BitsAndBytes >= 0.41.0

๋ชจ๋ธ ๊ตฌ์„ฑ

  • ๊ธฐ๋ณธ ๋ชจ๋ธ: Qwen/Qwen2.5-7B-Instruct
  • ํ•™์Šต ๋ฐฉ๋ฒ•: QLoRA (4๋น„ํŠธ ์–‘์žํ™”)
  • ์ธ์Šคํ„ด์Šค ์œ ํ˜•: ml.p5.48xlarge
  • ๋ถ„์‚ฐ ์ „๋žต: PyTorch DDP

ํ•™์Šต ๊ตฌ์„ฑ

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ

{
    'epochs': 3,
    'per_device_train_batch_size': 4,
    'gradient_accumulation_steps': 8,
    'learning_rate': 1e-5,
    'max_steps': 1000,
    'bf16': True,
    'max_length': 2048,
    'gradient_checkpointing': True,
    'optim': 'adamw_torch',
    'lr_scheduler_type': 'cosine',
    'warmup_ratio': 0.1,
    'weight_decay': 0.01,
    'max_grad_norm': 0.3
}

ํ™˜๊ฒฝ ๋ณ€์ˆ˜

ํ•™์Šต ํ™˜๊ฒฝ์€ ๋ถ„์‚ฐ ํ•™์Šต ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•œ ์ตœ์ ํ™”๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค:

  • CUDA ์žฅ์น˜ ๊ตฌ์„ฑ
  • ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™” ์„ค์ •
  • ๋ถ„์‚ฐ ํ•™์Šต์„ ์œ„ํ•œ EFA(Elastic Fabric Adapter) ๊ตฌ์„ฑ
  • Hugging Face ํ† ํฐ ๋ฐ ์บ์‹œ ์„ค์ •

ํ•™์Šต ํ”„๋กœ์„ธ์Šค

  1. ํ™˜๊ฒฝ ์ค€๋น„:

    • ํ•„์š”ํ•œ ์˜์กด์„ฑ์ด ํฌํ•จ๋œ requirements.txt ์ƒ์„ฑ
    • Transformers ์„ค์น˜๋ฅผ ์œ„ํ•œ bootstrap.sh ์ƒ์„ฑ
    • SageMaker ํ•™์Šต ๊ตฌ์„ฑ ์„ค์ •
  2. ๋ชจ๋ธ ๋กœ๋”ฉ:

    • 4๋น„ํŠธ ์–‘์žํ™”๋กœ ๊ธฐ๋ณธ Qwen 2.5 7B ๋ชจ๋ธ ๋กœ๋“œ
    • ์–‘์žํ™”๋ฅผ ์œ„ํ•œ BitsAndBytes ๊ตฌ์„ฑ
    • k-bit ํ•™์Šต์„ ์œ„ํ•œ ๋ชจ๋ธ ์ค€๋น„
  3. ๋ฐ์ดํ„ฐ์…‹ ์ฒ˜๋ฆฌ:

    • Sujet Finance ๋ฐ์ดํ„ฐ์…‹ ์‚ฌ์šฉ
    • Qwen2 ํ˜•์‹์œผ๋กœ ๋Œ€ํ™” ํฌ๋งทํŒ…
    • ์ตœ๋Œ€ 2048 ํ† ํฐ ๊ธธ์ด๋กœ ํ† ํฌ๋‚˜์ด์ง•
    • ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•œ ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ตฌํ˜„
  4. ํ•™์Šต:

    • ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์„ฑ์„ ์œ„ํ•œ gradient checkpointing ๊ตฌํ˜„
    • ์›œ์—…์ด ํฌํ•จ๋œ ์ฝ”์‚ฌ์ธ ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„ ์‚ฌ์šฉ
    • 50 ์Šคํ…๋งˆ๋‹ค ์ฒดํฌํฌ์ธํŠธ ์ €์žฅ
    • 10 ์Šคํ…๋งˆ๋‹ค ํ•™์Šต ๋ฉ”ํŠธ๋ฆญ ๋กœ๊น…

๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๋ฉ”ํŠธ๋ฆญ

ํ•™์Šต ๊ณผ์ •์—์„œ ๋‹ค์Œ ๋ฉ”ํŠธ๋ฆญ์„ ์ถ”์ ํ•ฉ๋‹ˆ๋‹ค:

  • ํ•™์Šต ์†์‹ค(Training loss)
  • ํ‰๊ฐ€ ์†์‹ค(Evaluation loss)

์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ

๊ตฌํ˜„์—๋Š” ํฌ๊ด„์ ์ธ ์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ ๋ฐ ๋กœ๊น…์ด ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค:

  • ํ™˜๊ฒฝ ์œ ํšจ์„ฑ ๊ฒ€์‚ฌ
  • ๋ฐ์ดํ„ฐ์…‹ ์ค€๋น„ ๊ฒ€์ฆ
  • ํ•™์Šต ํ”„๋กœ์„ธ์Šค ๋ชจ๋‹ˆํ„ฐ๋ง
  • ์ž์„ธํ•œ ์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€ ๋ฐ ์Šคํƒ ์ถ”์ 

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

  1. AWS ์ž๊ฒฉ ์ฆ๋ช… ๋ฐ SageMaker ์—ญํ•  ๊ตฌ์„ฑ
  2. Hugging Face ํ† ํฐ ์„ค์ •
  3. ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰:
python sagemaker_train.py

์ปค์Šคํ…€ ์ปดํฌ๋„ŒํŠธ

์ปค์Šคํ…€ ํ† ํฌ๋‚˜์ด์ €

ํ”„๋กœ์ ํŠธ๋Š” ๋‹ค์Œ ๊ธฐ๋Šฅ์ด ํฌํ•จ๋œ Qwen2 ํ† ํฌ๋‚˜์ด์ €์˜ ์ปค์Šคํ…€ ๊ตฌํ˜„(tokenization_qwen2.py)์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค:

  • ํŠน์ˆ˜ ํ† ํฐ ์ฒ˜๋ฆฌ
  • ์œ ๋‹ˆ์ฝ”๋“œ ์ •๊ทœํ™”
  • ์–ดํœ˜ ๊ด€๋ฆฌ
  • ๋ชจ๋ธ ํ•™์Šต์„ ์œ„ํ•œ ์ž…๋ ฅ ์ค€๋น„

์ฃผ์˜์‚ฌํ•ญ

  • ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” ml.p5.48xlarge ์ธ์Šคํ„ด์Šค ํƒ€์ž…์— ์ตœ์ ํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค
  • PyTorch Distributed Data Parallel์„ ์‚ฌ์šฉํ•œ ํ•™์Šต
  • ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•œ gradient checkpointing ๊ตฌํ˜„
  • ํ•™์Šต ์‹คํŒจ์— ๋Œ€ํ•œ ์ž๋™ ์žฌ์‹œ๋„ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ํฌํ•จ
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for seong67360/Qwen2.5-7B-Instruct_v3

Base model

Qwen/Qwen2.5-7B
Adapter
(936)
this model

Datasets used to train seong67360/Qwen2.5-7B-Instruct_v3