pathcosmos commited on
Commit
db64d78
ยท
verified ยท
1 Parent(s): 8b82c63

docs: add Korean model card + contact info

Browse files
Files changed (1) hide show
  1. README.md +301 -1
README.md CHANGED
@@ -82,6 +82,305 @@ model-index:
82
 
83
  # FRANKENSTALLM 3B
84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  > **A Korean 3B LLM built entirely from scratch โ€” tokenizer, pretraining, SFT, and ORPO โ€” on 8ร— NVIDIA B200 GPUs.**
86
 
87
  | | |
@@ -366,7 +665,8 @@ ollama run frankenstallm
366
 
367
  ---
368
 
369
- ## Links
370
 
371
  - **GitHub**: [pathcosmos/FRANKENSTALLM](https://github.com/pathcosmos/FRANKENSTALLM) โ€” Full source code, training scripts, and builder's log
372
  - **HuggingFace**: [pathcosmos/frankenstallm](https://huggingface.co/pathcosmos/frankenstallm)
 
 
82
 
83
  # FRANKENSTALLM 3B
84
 
85
+ > **ํ•œ๊ตญ์–ด 3B LLM์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ์ง์ ‘ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค โ€” ํ† ํฌ๋‚˜์ด์ € ํ•™์Šต๋ถ€ํ„ฐ ์‚ฌ์ „ํ•™์Šต, SFT, ORPO๊นŒ์ง€, 8ร— NVIDIA B200 GPU ์œ„์—์„œ.**
86
+
87
+ | | |
88
+ |---|---|
89
+ | **๊ฐœ๋ฐœ์ž** | [pathcosmos](https://huggingface.co/pathcosmos) |
90
+ | **ํŒŒ๋ผ๋ฏธํ„ฐ** | ~24์–ต (weight tying ์ ์šฉ, 3B๊ธ‰) |
91
+ | **์–ธ์–ด** | ํ•œ๊ตญ์–ด (์ฃผ), ์˜์–ด (๋ถ€) |
92
+ | **๋ผ์ด์„ ์Šค** | Apache 2.0 |
93
+ | **ํ•™์Šต** | 3๋‹จ๊ณ„: ์‚ฌ์ „ํ•™์Šต โ†’ SFT โ†’ ORPO |
94
+ | **ํ•˜๋“œ์›จ์–ด** | 8ร— NVIDIA B200 (FP8), ์ด ~86์‹œ๊ฐ„ |
95
+
96
+ ---
97
+
98
+ ## ๋น ๋ฅธ ์‹œ์ž‘
99
+
100
+ ### Transformers
101
+
102
+ ```python
103
+ from transformers import AutoModelForCausalLM, AutoTokenizer
104
+ import torch
105
+
106
+ model_id = "pathcosmos/frankenstallm"
107
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
108
+ model = AutoModelForCausalLM.from_pretrained(
109
+ model_id, torch_dtype=torch.bfloat16, device_map="auto"
110
+ )
111
+
112
+ inputs = tokenizer(
113
+ "ํ•œ๊ตญ์˜ ์ „ํ†ต ์Œ์‹ ์ค‘ ๊น€์น˜์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”.",
114
+ return_tensors="pt"
115
+ ).to(model.device)
116
+
117
+ with torch.no_grad():
118
+ outputs = model.generate(
119
+ **inputs,
120
+ do_sample=True,
121
+ temperature=0.7,
122
+ repetition_penalty=1.2, # ๊ถŒ์žฅ
123
+ top_p=0.9,
124
+ max_new_tokens=512,
125
+ pad_token_id=tokenizer.eos_token_id,
126
+ )
127
+
128
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
129
+ ```
130
+
131
+ ### Ollama (GGUF)
132
+
133
+ ```bash
134
+ # GGUF + Modelfile ๋‹ค์šด๋กœ๋“œ
135
+ huggingface-cli download pathcosmos/frankenstallm \
136
+ gguf/frankenstallm-3b-v2-Q4_K_M.gguf \
137
+ gguf/Modelfile.3b-v2-Q4_K_M \
138
+ --local-dir ./frankenstallm
139
+
140
+ # Modelfile ๋‚ด FROM ๊ฒฝ๋กœ ์ˆ˜์ • ํ›„ ์ƒ์„ฑ
141
+ ollama create frankenstallm -f ./frankenstallm/gguf/Modelfile.3b-v2-Q4_K_M
142
+
143
+ # ์‹คํ–‰
144
+ ollama run frankenstallm
145
+ ```
146
+
147
+ ---
148
+
149
+ ## ๋ชจ๋ธ ํŠน์ง•
150
+
151
+ - **์ฒ˜์Œ๋ถ€ํ„ฐ ๋งŒ๋“  ํ•œ๊ตญ์–ด ํ† ํฌ๋‚˜์ด์ €**: SentencePiece Unigram, 64K ์–ดํœ˜, ํ•œ๊ตญ์–ด ๋ฌธ์ž ์ปค๋ฒ„๋ฆฌ์ง€ 99.95%
152
+ - **3๋‹จ๊ณ„ ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ**: ์‚ฌ์ „ํ•™์Šต (57K ์Šคํ…, ~600์–ต ํ† ํฐ) โ†’ SFT (25.5K ์Šคํ…, 240๋งŒ ์ƒ˜ํ”Œ) โ†’ ORPO (10K ์Šคํ…, 63๋งŒ ์„ ํ˜ธ๋„ ์Œ)
153
+ - **B200 FP8 ๋„ค์ดํ‹ฐ๋ธŒ ํ•™์Šต**: TransformerEngine MXFP8 โ€” BF16 ๋Œ€๋น„ ์ด๋ก ์  2๋ฐฐ ์ฒ˜๋ฆฌ๋Ÿ‰
154
+ - **GGUF ๋ฐฐํฌ ์ง€์›**: Q4_K_M (757MB), Q8_0 (1.2GB), F16 (2.3GB) + Ollama Modelfile ์ œ๊ณต
155
+
156
+ ---
157
+
158
+ ## ์•„ํ‚คํ…์ฒ˜
159
+
160
+ | ๊ตฌ์„ฑ ์š”์†Œ | ๊ฐ’ |
161
+ |-----------|-----|
162
+ | ๊ตฌ์กฐ | Decoder-only Transformer (LLaMA ์Šคํƒ€์ผ) |
163
+ | Hidden size | 3,072 |
164
+ | ๋ ˆ์ด์–ด ์ˆ˜ | 28 |
165
+ | ์–ดํ…์…˜ ํ—ค๋“œ | 24 |
166
+ | KV ํ—ค๋“œ | 8 (GQA 3:1) |
167
+ | FFN ์ฐจ์› | 8,192 (SwiGLU) |
168
+ | ์–ดํœ˜ ํฌ๊ธฐ | 64,000 |
169
+ | ์ปจํ…์ŠคํŠธ ๊ธธ์ด | 4,096 (ํ•™์Šต ์‹œ 2,048) |
170
+ | ์œ„์น˜ ์ธ์ฝ”๋”ฉ | RoPE (ฮธ=500,000) |
171
+ | ์ •๊ทœํ™” | Pre-norm RMSNorm |
172
+ | ์–ดํ…์…˜ ๊ตฌํ˜„ | FlashAttention-2 |
173
+ | ์ •๋ฐ€๋„ | FP8 (TransformerEngine MXFP8) |
174
+ | Weight tying | ์ ์šฉ (embedding โ†” lm_head) |
175
+
176
+ ---
177
+
178
+ ## ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ
179
+
180
+ ### Phase 1: ์‚ฌ์ „ํ•™์Šต
181
+
182
+ | ํ•ญ๋ชฉ | ๊ฐ’ |
183
+ |------|-----|
184
+ | ์Šคํ… ์ˆ˜ | 57,000 |
185
+ | ์ตœ์ข… loss | 1.466 |
186
+ | ํ•™์Šต ํ† ํฐ | ~600์–ต (385์–ต ๊ณ ์œ  ร— ~1.5 ์—ํญ) |
187
+ | ์†Œ์š” ์‹œ๊ฐ„ | ~63์‹œ๊ฐ„ |
188
+ | ๋ฐ์ดํ„ฐ | CC-100 KO, HPLT KO, C4 KO, ๋‚˜๋ฌด์œ„ํ‚ค, ์œ„ํ‚คํ”ผ๋””์•„ KO, Cosmopedia (EN) |
189
+ | ๋ฐฐ์น˜ ํฌ๊ธฐ | 5 ร— 8 GPU ร— 8 accum ร— 2,048 seq = ~65๋งŒ ํ† ํฐ/์Šคํ… |
190
+
191
+ ### Phase 2: SFT (์ง€๋„ ๋ฏธ์„ธ์กฐ์ •)
192
+
193
+ | ํ•ญ๋ชฉ | ๊ฐ’ |
194
+ |------|-----|
195
+ | ์Šคํ… ์ˆ˜ | 25,500 (77.3% ์ง€์ ์—์„œ ์กฐ๊ธฐ ์ข…๋ฃŒ) |
196
+ | ์ตœ์  val_loss | 1.8851 (step 23,000) |
197
+ | ์†Œ์š” ์‹œ๊ฐ„ | ~15.5์‹œ๊ฐ„ |
198
+ | ๋ฐ์ดํ„ฐ | 24๊ฐœ ์†Œ์Šค, 243๋งŒ 9,397 ์ƒ˜ํ”Œ (7.48 GB) |
199
+ | ๊ตฌ์„ฑ | SFT 70% + ์‚ฌ์ „ํ•™์Šต ๋ฆฌํ”Œ๋ ˆ์ด 30% (์น˜๋ช…์  ๋ง๊ฐ ๋ฐฉ์ง€) |
200
+ | ์ง€์‹ ๋ง๊ฐ๋ฅ  | 0.9% (19๊ฐœ ๋ฐ์ดํ„ฐ์…‹ ๊ธฐ์ค€) |
201
+
202
+ ### Phase 3: ORPO (์„ ํ˜ธ๋„ ์ตœ์ ํ™”)
203
+
204
+ | ํ•ญ๋ชฉ | ๊ฐ’ |
205
+ |------|-----|
206
+ | ์Šคํ… ์ˆ˜ | 9,997 (์กฐ๊ธฐ ์ˆ˜๋ ด) |
207
+ | ์ตœ์  eval_loss | 1.625 |
208
+ | ์„ ํ˜ธ๋„ ์ •ํ™•๋„ | 76.02% |
209
+ | ๋ณด์ƒ ๋งˆ์ง„ | 0.6100 |
210
+ | ์†Œ์š” ์‹œ๊ฐ„ | ~7์‹œ๊ฐ„ |
211
+ | ๋ฐ์ดํ„ฐ | ํ•œ๊ตญ์–ด HF ๋ฐ์ดํ„ฐ์…‹ 7์ข…, ~63๋งŒ ์„ ํ˜ธ๋„ ์Œ |
212
+ | ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ | beta=0.25, lr=1.2e-5, eff_batch=128 |
213
+
214
+ **์ด ํ•™์Šต ์‹œ๊ฐ„: 8ร— B200์—์„œ ์•ฝ 86์‹œ๊ฐ„**
215
+
216
+ ---
217
+
218
+ ## ๋ฒค์น˜๋งˆํฌ
219
+
220
+ ### ํ•™์Šต ๋‹จ๊ณ„๋ณ„ ์„ฑ๋Šฅ ๋ณ€ํ™” (Base โ†’ SFT โ†’ ORPO)
221
+
222
+ | ๋ฒค์น˜๋งˆํฌ | Base | SFT | ORPO | ๋ณ€ํ™” (Baseโ†’ORPO) |
223
+ |-----------|:----:|:---:|:----:|:---:|
224
+ | **KoBEST ํ‰๊ท  (0-shot)** | 43.7% | 43.3% | **52.8%** | **+9.1pp** |
225
+ | KoBEST COPA | 49.3% | 48.6% | **63.9%** | +14.6pp |
226
+ | KoBEST HellaSwag-KO | 21.6% | 19.8% | **38.0%** | +16.4pp |
227
+ | KoBEST SentiNeg | 48.6% | 49.1% | **62.5%** | +13.9pp |
228
+ | KoBEST BoolQ | 50.3% | 50.1% | 50.6% | +0.3pp |
229
+ | PIQA | 52.5% | 52.6% | **59.9%** | +7.3pp |
230
+ | ARC-Easy | 25.6% | 25.9% | **36.0%** | +10.4pp |
231
+ | HAE-RAE | 19.7% | 19.9% | 21.8% | +2.1pp |
232
+ | HellaSwag EN | 26.2% | 26.1% | 29.2% | +3.0pp |
233
+ | Greedy 3-gram ๋ฐ˜๋ณต๋ฅ  | 61.0% | 73.0% | **30.9%** | -30.1pp |
234
+ | EOS ์ข…๋ฃŒ์œจ | 0% | 60% | **67%** | +67pp |
235
+ | PPL ๋ง๊ฐ๋ฅ  | โ€” | 0.9% | 4.1% | 15% ์ด๋‚ด โœ… |
236
+
237
+ ### 3B๊ธ‰ ๋ชจ๋ธ ๋น„๊ต (Ollama, 35๊ฐœ ํ…Œ์ŠคํŠธ)
238
+
239
+ | ๋ชจ๋ธ | ํŒŒ๋ผ๋ฏธํ„ฐ | ํ•œ๊ตญ์–ด NLU | ์ง€์‹ | ์ง€์‹œ ์ˆ˜ํ–‰ | ์ถ”๋ก  | ํ‰๊ท  ๏ฟฝ๏ฟฝ์ˆ˜ |
240
+ |-------|:------:|:----------:|:----:|:---------:|:----:|:---------:|
241
+ | Qwen 2.5 3B | 3B | 100.0 | 20.8 | 55.6 | 62.5 | **63.4** |
242
+ | Phi-4 Mini | 3.8B | 66.7 | 29.2 | 33.3 | **87.5** | 60.6 |
243
+ | **FRANKENSTALLM 3B** | **3B** | **100.0** | **75.0** | **66.7** | 50.0 | 46.7 |
244
+
245
+ > FRANKENSTALLM์€ **ํ•œ๊ตญ์–ด NLU** (Qwen๊ณผ ๋™๋ฅ ), **ํ•œ๊ตญ์–ด ์ง€์‹** (75.0 vs 20.8/29.2), **์ง€์‹œ ์ˆ˜ํ–‰** (66.7 vs 55.6/33.3)์—์„œ ์•ž์„ญ๋‹ˆ๋‹ค.
246
+
247
+ ### ์ถ”๋ก  ์†๋„ (Ollama, Q4_K_M)
248
+
249
+ | ๋ชจ๋ธ | ํ‰๊ท  TTFT | TPS | ๋น„๊ณ  |
250
+ |-------|:--------:|:---:|------|
251
+ | **FRANKENSTALLM 3B** | **16.7ms** | **142.5** | ๊ฐ€์žฅ ๋น ๋ฆ„ |
252
+ | Phi-4 Mini 3.8B | 25.6ms | 100.4 | |
253
+ | Qwen 2.5 3B | 28.2ms | 93.8 | |
254
+
255
+ ### Perplexity ๋ณด์กด์œจ (ORPO ์ง€์‹ ์œ ์ง€)
256
+
257
+ | ๋ฐ์ดํ„ฐ์…‹ | Base PPL | ORPO PPL | ๋ง๊ฐ๋ฅ  |
258
+ |---------|:--------:|:--------:|:------:|
259
+ | Korean C4 | 5.72 | 5.87 | +2.7% |
260
+ | Korean Wiki | 11.84 | 12.21 | +3.2% |
261
+ | ์ตœ๋Œ€ ๋ง๊ฐ๋ฅ  | โ€” | โ€” | 4.1% โœ… |
262
+
263
+ ---
264
+
265
+ ## ํ•™์Šต ๋ฐ์ดํ„ฐ
266
+
267
+ ### ์‚ฌ์ „ํ•™์Šต (~385์–ต ํ† ํฐ)
268
+
269
+ | ๋ถ„๋ฅ˜ | ์†Œ์Šค | ์ถ”์ • ํ† ํฐ ์ˆ˜ |
270
+ |------|------|:-----------:|
271
+ | ํ•œ๊ตญ์–ด ์›น ํฌ๋กค | C4 KO, CC-100 KO, HPLT KO | ~172์–ต |
272
+ | ํ•œ๊ตญ์–ด ๋ฐฑ๊ณผ์‚ฌ์ „ | ์œ„ํ‚คํ”ผ๋””์•„ KO, ๋‚˜๋ฌด์œ„ํ‚ค (2๊ฐœ ๋ฒ„์ „) | ~28์–ต |
273
+ | ์˜์–ด ๊ต์œก | Cosmopedia (Stories, Web, Stanford, WikiHow, OpenStax, Khan) | ~57์–ต |
274
+ | ์˜์–ด ์ˆ˜ํ•™ยท๊ณผํ•™ | AutoMathText, OpenWebMath, Proof-Pile-2 | ~85์–ต |
275
+ | ์ฝ”๋“œ | StarCoder (ํ•„ํ„ฐ๋ง) | ~43์–ต |
276
+
277
+ ### SFT (240๋งŒ ์ƒ˜ํ”Œ, 24๊ฐœ ์†Œ์Šค)
278
+
279
+ | ์˜์—ญ | ๋น„์œจ | ์ฃผ์š” ๋ฐ์ดํ„ฐ์…‹ |
280
+ |------|:----:|-------------|
281
+ | ์ถ”๋ก /CoT | 38% | reasoning_r1_1.4m, magpie_reasoning |
282
+ | ํ•œ๊ตญ์–ด ์ง€์‹œ๋ฌธ | 23% | korean_instruction_mix, open_korean_instructions, kullm_v2 |
283
+ | ์˜์–ด ์ผ๋ฐ˜ | 16% | openhermes_2.5, ultrachat_200k |
284
+ | ์ˆ˜ํ•™ | 12% | NuminaMath-CoT, orca-math-ko |
285
+ | ๋Œ€ํ™”/์ฝ”๋“œ/๊ธฐํƒ€ | 11% | smol-koreantalk, Evol-Instruct-Code-80k-ko |
286
+
287
+ ### ORPO (~63๋งŒ ์„ ํ˜ธ๋„ ์Œ, 7๊ฐœ ์†Œ์Šค)
288
+
289
+ | ๋ฐ์ดํ„ฐ์…‹ | ์šฉ๋Ÿ‰ | ์˜์—ญ |
290
+ |---------|:----:|------|
291
+ | nayohan/preference-collection-ko-full | 4.9GB | ์ผ๋ฐ˜ ์„ ํ˜ธ๋„ |
292
+ | heegyu/orca-math-korean-preference-cleaned | 1.6GB | ์ˆ˜ํ•™ ์ถ”๋ก  |
293
+ | kuotient/orca-math-korean-dpo-pairs | 750MB | ์ˆ˜ํ•™ DPO |
294
+ | maywell/ko_Ultrafeedback_binarized | 394MB | ํ”ผ๋“œ๋ฐฑ ์ •๋ ฌ |
295
+ | tellang/yeji-preference-ko-v1 | 171MB | ์ผ๋ฐ˜ ์„ ํ˜ธ๋„ |
296
+ | jojo0217/korean_rlhf_dataset | 137MB | RLHF ์Œ |
297
+ | lemon-mint/korean-realqa-reasoning-v01-preference | 58MB | QA ์ถ”๋ก  |
298
+
299
+ ---
300
+
301
+ ## GGUF & Ollama
302
+
303
+ ### ์ œ๊ณต ์–‘์žํ™” ํŒŒ์ผ
304
+
305
+ | ํŒŒ์ผ | ํฌ๊ธฐ | ์„ค๋ช… |
306
+ |------|:----:|------|
307
+ | `gguf/frankenstallm-3b-v2-Q4_K_M.gguf` | 757MB | **๊ถŒ์žฅ** โ€” ํฌ๊ธฐ ๋Œ€๋น„ ์ตœ์  ํ’ˆ์งˆ |
308
+ | `gguf/frankenstallm-3b-v2-Q8_0.gguf` | 1.2GB | ๋†’์€ ํ’ˆ์งˆ |
309
+ | `gguf/frankenstallm-3b-v2-f16.gguf` | 2.3GB | ์ „์ฒด ์ •๋ฐ€๋„ |
310
+ | `model.safetensors` | 4.76GB | Transformers ๋„ค์ดํ‹ฐ๋ธŒ (ORPO best, byte-fallback ์ˆ˜์ • ์™„๋ฃŒ) |
311
+
312
+ ### ๊ถŒ์žฅ ์ƒ˜ํ”Œ๋ง ํŒŒ๋ผ๋ฏธํ„ฐ
313
+
314
+ | ํŒŒ๋ผ๋ฏธํ„ฐ | ๊ฐ’ | ๋น„๊ณ  |
315
+ |---------|:---:|------|
316
+ | `temperature` | 0.7 | ํ•œ๊ตญ์–ด ์ƒ์„ฑ ํ’ˆ์งˆ ์ตœ์  |
317
+ | `repeat_penalty` | 1.2 | **ํ•„์ˆ˜** โ€” ๋ฏธ์ ์šฉ ์‹œ greedy ๋ฐ˜๋ณต๋ฅ  30.9% |
318
+ | `top_p` | 0.9 | Nucleus ์ƒ˜ํ”Œ๋ง |
319
+ | `top_k` | 50 | Top-k ํ›„๋ณด ์ˆ˜ |
320
+ | `max_tokens` | 512 | ์ตœ๋Œ€ ์ƒ์„ฑ ๊ธธ์ด |
321
+ | `num_ctx` | 4096 | ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ (์ดˆ๊ณผ ๊ธˆ์ง€) |
322
+
323
+ > โš ๏ธ ๋ฐ˜๋“œ์‹œ `repeat_penalty >= 1.2`๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”. ์ ์šฉํ•˜๋ฉด ๋ฐ˜๋ณต๋ฅ ์ด **0%** ๋กœ ๋–จ์–ด์ง‘๋‹ˆ๋‹ค. ๋ฏธ์ ์šฉ ์‹œ greedy ๋””์ฝ”๋”ฉ์—์„œ ~31% 3-gram ๋ฐ˜๋ณต์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
324
+
325
+ ---
326
+
327
+ ## ์ œํ•œ ์‚ฌํ•ญ
328
+
329
+ - **์˜์–ด ์„ฑ๋Šฅ ์ œํ•œ**: MMLU-EN ~23%, HellaSwag-EN ~29% โ€” ํ•œ๊ตญ์–ด ํŠนํ™” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค
330
+ - **์ฝ”๋“œ ์ƒ์„ฑ**: ๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅ (ํ•™์Šต ๋ฐ์ดํ„ฐ์— ์ฝ”๋“œ ๋น„์ค‘์ด ๋‚ฎ์Œ)
331
+ - **Greedy ๋ฐ˜๋ณต**: `repeat_penalty` ๋ฏธ์‚ฌ์šฉ ์‹œ 30.9% 3-gram ๋ฐ˜๋ณต โ€” ๋ฐ˜๋“œ์‹œ `repeat_penalty >= 1.2` ์‚ฌ์šฉ
332
+ - **์•ˆ์ „์„ฑ**: ์•ˆ์ „ ์ •๋ ฌ(safety alignment) ๋ฐ์ดํ„ฐ๊ฐ€ ํ•™์Šต์— ํฌํ•จ๋˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ ์ ์ ˆํ•œ ๊ฐ€๋“œ๋ ˆ์ผ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์„ธ์š”
333
+ - **๊ทœ๋ชจ ์ฐจ์ด**: ์ˆ˜์กฐ ํ† ํฐ์œผ๋กœ ํ•™์Šต๋œ ์ƒ์šฉ 3B ๋ชจ๋ธ ๋Œ€๋น„ ~600์–ต ํ† ํฐ์œผ๋กœ ํ•™์Šต โ€” ์ „๋ฐ˜์  ๋ฒค์น˜๋งˆํฌ ์ ์ˆ˜๋Š” ๋‚ฎ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
334
+
335
+ ---
336
+
337
+ ## ํ•˜๋“œ์›จ์–ด ๋ฐ ํ•™์Šต ํ™˜๊ฒฝ
338
+
339
+ | ๊ตฌ์„ฑ ์š”์†Œ | ์‚ฌ์–‘ |
340
+ |-----------|------|
341
+ | GPU | 8ร— NVIDIA B200 (183GB HBM3e ร— 8, ์ด ~1.47TB) |
342
+ | FP8 ์—ฐ์‚ฐ | 2,250 TFLOPS/GPU (์ด 18,000 TFLOPS) |
343
+ | ์ธํ„ฐ์ปค๋„ฅํŠธ | NVLink 5.0, NVSwitch all-to-all mesh |
344
+ | CPU | 2ร— AMD EPYC 9365 (72์ฝ”์–ด, Zen 5) |
345
+ | RAM | 2.21 TB DDR5 |
346
+ | PyTorch | 2.10.0a0+b4e4ee81d3.nv25.12 (NVIDIA ์ปค์Šคํ…€) |
347
+ | TransformerEngine | 2.10.0 |
348
+ | FlashAttention | 2.7.4 |
349
+ | NCCL | 2.28.9 |
350
+ | CUDA | 13.1 |
351
+ | ์ด ํ•™์Šต ์‹œ๊ฐ„ | ~86์‹œ๊ฐ„ (์‚ฌ์ „ํ•™์Šต 63h + SFT 15.5h + ORPO 7h) |
352
+
353
+ ---
354
+
355
+ ## ์ธ์šฉ
356
+
357
+ ```bibtex
358
+ @misc{frankenstallm2026,
359
+ title={FRANKENSTALLM: A Korean 3B LLM Built From Scratch on B200 GPUs},
360
+ author={pathcosmos},
361
+ year={2026},
362
+ url={https://huggingface.co/pathcosmos/frankenstallm},
363
+ note={3-phase training (Pretrain, SFT, ORPO) with FP8 on 8x NVIDIA B200}
364
+ }
365
+ ```
366
+
367
+ ---
368
+
369
+ ## ๋งํฌ ๋ฐ ์—ฐ๋ฝ์ฒ˜
370
+
371
+ - **GitHub**: [pathcosmos/FRANKENSTALLM](https://github.com/pathcosmos/FRANKENSTALLM) โ€” ์ „์ฒด ์†Œ์Šค์ฝ”๋“œ, ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ, ๋นŒ๋” ๋กœ๊ทธ
372
+ - **HuggingFace**: [pathcosmos/frankenstallm](https://huggingface.co/pathcosmos/frankenstallm)
373
+ - **์—ฐ๋ฝ์ฒ˜**: pathcosmos@gmail.com
374
+
375
+ ---
376
+ ---
377
+
378
+ > ๐Ÿ‡บ๐Ÿ‡ธ **English version below**
379
+
380
+ ---
381
+
382
+ # FRANKENSTALLM 3B
383
+
384
  > **A Korean 3B LLM built entirely from scratch โ€” tokenizer, pretraining, SFT, and ORPO โ€” on 8ร— NVIDIA B200 GPUs.**
385
 
386
  | | |
 
665
 
666
  ---
667
 
668
+ ## Links & Contact
669
 
670
  - **GitHub**: [pathcosmos/FRANKENSTALLM](https://github.com/pathcosmos/FRANKENSTALLM) โ€” Full source code, training scripts, and builder's log
671
  - **HuggingFace**: [pathcosmos/frankenstallm](https://huggingface.co/pathcosmos/frankenstallm)
672
+ - **Contact**: pathcosmos@gmail.com