factory_qwen_results
This model is a fine-tuned version of Qwen/Qwen3-Coder-30B-A3B-Instruct on the train dataset. It achieves the following results on the evaluation set:
- Loss: 0.1424
- Accuracy: 0.9676
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0004
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 3
- total_train_batch_size: 12
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.08
- num_epochs: 3.0
Training results
| Training Loss | Epoch | Step | Accuracy | Validation Loss |
|---|---|---|---|---|
| 0.2607 | 0.0811 | 30 | 0.9369 | 0.2531 |
| 0.2818 | 0.1622 | 60 | 0.9464 | 0.2187 |
| 0.193 | 0.2432 | 90 | 0.9497 | 0.2058 |
| 0.1835 | 0.3243 | 120 | 0.9512 | 0.1971 |
| 0.1586 | 0.4054 | 150 | 0.9528 | 0.1891 |
| 0.141 | 0.4865 | 180 | 0.9552 | 0.1821 |
| 0.1359 | 0.5676 | 210 | 0.9561 | 0.1726 |
| 0.1038 | 0.6486 | 240 | 0.9574 | 0.1720 |
| 0.1784 | 0.7297 | 270 | 0.9578 | 0.1632 |
| 0.3386 | 0.8108 | 300 | 0.9590 | 0.1573 |
| 0.1101 | 0.8919 | 330 | 0.9609 | 0.1555 |
| 0.1123 | 0.9730 | 360 | 0.9619 | 0.1513 |
| 0.0956 | 1.0541 | 390 | 0.9618 | 0.1552 |
| 0.0802 | 1.1351 | 420 | 0.9634 | 0.1525 |
| 0.0671 | 1.2162 | 450 | 0.9634 | 0.1519 |
| 0.0738 | 1.2973 | 480 | 0.9639 | 0.1493 |
| 0.0622 | 1.3784 | 510 | 0.9639 | 0.1477 |
| 0.063 | 1.4595 | 540 | 0.9658 | 0.1435 |
| 0.0593 | 1.5405 | 570 | 0.9654 | 0.1499 |
| 0.2748 | 1.6216 | 600 | 0.9666 | 0.1479 |
| 0.0804 | 1.7027 | 630 | 0.9661 | 0.1440 |
| 0.0631 | 1.7838 | 660 | 0.9668 | 0.1427 |
| 0.0414 | 1.8649 | 690 | 0.9668 | 0.1446 |
| 0.0507 | 1.9459 | 720 | 0.9676 | 0.1424 |
| 0.0261 | 2.0270 | 750 | 0.9689 | 0.1542 |
| 0.0324 | 2.1081 | 780 | 0.9688 | 0.1578 |
| 0.0291 | 2.1892 | 810 | 0.9681 | 0.1501 |
| 0.0205 | 2.2703 | 840 | 0.9684 | 0.1578 |
| 0.0271 | 2.3514 | 870 | 0.9688 | 0.1545 |
| 0.0185 | 2.4324 | 900 | 0.1644 | 0.9684 |
| 0.0243 | 2.5135 | 930 | 0.1571 | 0.9695 |
| 0.0218 | 2.5946 | 960 | 0.1562 | 0.9703 |
| 0.0229 | 2.6757 | 990 | 0.1565 | 0.9701 |
| 0.028 | 2.7568 | 1020 | 0.1583 | 0.9699 |
| 0.0193 | 2.8378 | 1050 | 0.1578 | 0.9703 |
| 0.0192 | 2.9189 | 1080 | 0.1598 | 0.9702 |
| 0.0231 | 3.0 | 1110 | 0.1610 | 0.9702 |
Framework versions
- PEFT 0.17.1
- Transformers 4.57.1
- Pytorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.2
- Downloads last month
- 41
Model tree for finalform/velocityFoamQwen-30B
Base model
Qwen/Qwen3-Coder-30B-A3B-Instruct