--- license: apache-2.0 datasets: - AINovice2005/cicflow-ids-multiclass language: - en base_model: - answerdotai/ModernBERT-base tags: - CyberSecurity - BERT - LoRA - PEFT pipeline_tag: fill-mask library_name: peft --- This model fine‑tunes ModernBERT‑base using LoRA (Low‑Rank Adaptation) for efficient parameter‑tuning. It is designed for binary classification tasks where high recall and controlled false positive rates are important. ## Training Configuration - Seed: 42 (ensures reproducibility) - Batch sizes: Train = 128, Eval = 256 - Max sequence length: 256 - Epochs: 1 (baseline run) - Learning rate: 3e‑4 - Weight decay: 0.01 - Warmup ratio: 0.05 - Gradient clipping: 1.0 - Early stopping patience: 3 - - Steps: 5,241 ## LoRA Setup - Enabled: Yes - Rank (r): 8 - Alpha: 16 - Dropout: 0.05 - Target modules: Attention (Wqkv, Wo) and MLP (Wi, Wo) layers - Max drift ratio: 0.1 LoRA adapters allow efficient fine‑tuning by updating only small low‑rank matrices, reducing memory and compute requirements. ## Loss Function Training uses Asymmetric Focal Loss, which emphasizes hard negatives while keeping positive weighting mild. This helps balance recall and false positive rate. - Gamma_pos: 0.0 (minimal emphasis on positives) - Gamma_neg: 4.0 (stronger emphasis on negatives) - Clip: 0.05 (stability for probabilities) Validation is performed every 5000 steps, with early stopping to prevent overfitting. ## Usage ## Usage: ```python import torch from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline from peft import PeftModel # Base ModernBERT model base_model_name = "answerdotai/ModernBERT-base" # LoRA adapter checkpoint adapter_model_name = "AINovice2005/ModernBERT-base-lora-cicflow-1m-r8" # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(base_model_name) # Load base masked language model base_model = AutoModelForMaskedLM.from_pretrained(base_model_name) # Attach LoRA adapter model = PeftModel.from_pretrained(base_model, adapter_model_name) # Move to device device = "cuda" if torch.cuda.is_available() else "cpu" model = model.to(device) # Build fill-mask pipeline fill_mask = pipeline( "fill-mask", model=model, tokenizer=tokenizer, device=0 if device == "cuda" else -1 ) # Example usage text = "The network traffic shows a [MASK] pattern." outputs = fill_mask(text) for o in outputs: print(f"Token: {o['token_str']}, Score: {o['score']:.4f}") ``` ## Intended Use - Binary classification tasks where recall is critical. - Efficient fine‑tuning scenarios with limited compute resources. - Research and experimentation with parameter‑efficient methods. ## Artifacts: - LoRA adapter - Training configuration and evaluation logs