llama71b-mentalchat16k

This model is a fine-tuned version of meta-llama/Llama-3.1-70B-Instruct on the ShenLab/MentalChat16k dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6542

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.8207 0.1496 100 0.7920
0.7716 0.2992 200 0.7492
0.7208 0.4488 300 0.7363
0.7237 0.5985 400 0.7187
0.7156 0.7481 500 0.7088
0.7024 0.8977 600 0.6963
0.6125 1.0464 700 0.7004
0.5753 1.1960 800 0.6942
0.5497 1.3456 900 0.6878
0.5589 1.4952 1000 0.6804
0.5453 1.6448 1100 0.6761
0.5316 1.7945 1200 0.6693
0.5422 1.9441 1300 0.6634
0.349 2.0928 1400 0.7011
0.3481 2.2424 1500 0.7033
0.337 2.3920 1600 0.7048
0.3505 2.5416 1700 0.7049
0.3424 2.6912 1800 0.7052

LLAMA 3.1 TEST SET EVALUATION:

================================================== ROUGE Scores (Average F-Measure):

  • ROUGE-1: 0.3051
  • ROUGE-2: 0.1122
  • ROUGE-L: 0.1678

BLEU Score: - BLEU: 0.0646

Framework versions

  • PEFT 0.18.0
  • Transformers 4.57.1
  • Pytorch 2.5.1+cu124
  • Datasets 4.4.1
  • Tokenizers 0.22.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for advy/llama71b-mentalchat16k

Adapter
(50)
this model

Dataset used to train advy/llama71b-mentalchat16k

Evaluation results