---
license: apache-2.0
datasets:
- AINovice2005/cicflow-ids-multiclass
language:
- en
base_model:
- answerdotai/ModernBERT-base
tags:
- CyberSecurity
- BERT
- LoRA
- PEFT
pipeline_tag: fill-mask
library_name: peft
---
This model fine‑tunes ModernBERT‑base using LoRA (Low‑Rank Adaptation) for efficient parameter‑tuning. 

It is designed for binary classification tasks where high recall and controlled false positive rates are important.

## Training Configuration
- Seed: 42 (ensures reproducibility)
- Batch sizes: Train = 128, Eval = 256
- Max sequence length: 256
- Epochs: 1 (baseline run)
- Learning rate: 3e‑4
- Weight decay: 0.01
- Warmup ratio: 0.05
- Gradient clipping: 1.0
- Early stopping patience: 3
- - Steps: 5,241


## LoRA Setup

- Enabled: Yes
- Rank (r): 8
- Alpha: 16
- Dropout: 0.05
- Target modules: Attention (Wqkv, Wo) and MLP (Wi, Wo) layers
- Max drift ratio: 0.1


LoRA adapters allow efficient fine‑tuning by updating only small low‑rank matrices, reducing memory and compute requirements.


## Loss Function
Training uses Asymmetric Focal Loss, which emphasizes hard negatives while keeping positive weighting mild. This helps balance recall and false positive rate.


- Gamma_pos: 0.0 (minimal emphasis on positives)
- Gamma_neg: 4.0 (stronger emphasis on negatives)
- Clip: 0.05 (stability for probabilities)

Validation is performed every 5000 steps, with early stopping to prevent overfitting.

## Usage

## Usage:

```python
import torch
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline
from peft import PeftModel

# Base ModernBERT model
base_model_name = "answerdotai/ModernBERT-base"

# LoRA adapter checkpoint
adapter_model_name = "AINovice2005/ModernBERT-base-lora-cicflow-1m-r8"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load base masked language model
base_model = AutoModelForMaskedLM.from_pretrained(base_model_name)

# Attach LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_model_name)

# Move to device
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Build fill-mask pipeline
fill_mask = pipeline(
    "fill-mask",
    model=model,
    tokenizer=tokenizer,
    device=0 if device == "cuda" else -1
)

# Example usage
text = "The network traffic shows a [MASK] pattern."
outputs = fill_mask(text)

for o in outputs:
    print(f"Token: {o['token_str']}, Score: {o['score']:.4f}")
```


## Intended Use
- Binary classification tasks where recall is critical.
- Efficient fine‑tuning scenarios with limited compute resources.
- Research and experimentation with parameter‑efficient methods.

## Artifacts:
- LoRA adapter
- Training configuration and evaluation logs