deberta-unfair-tos-augmented
Best performing model - DeBERTa trained with augmented data for UNFAIR-ToS classification
Model Description
This model is fine-tuned on the LexGLUE UNFAIR-ToS dataset to detect unfair clauses in Terms of Service documents.
Base Model: microsoft/deberta-base
Performance
Evaluation Metrics:
- Exact Match Accuracy: Percentage of samples where all predicted labels exactly match ground truth (strict multi-label metric)
- Micro-F1: Harmonic mean of precision and recall, aggregated across all labels
| Metric | Score |
|---|---|
| Exact Match Accuracy | 94.12% |
| Micro-F1 | 0.96 |
| Micro-Precision | 0.98 |
Risk Categories
The model classifies text into 8 risk categories:
| ID | Category |
|---|---|
| 0 | Limitation of liability |
| 1 | Unilateral termination |
| 2 | Unilateral change |
| 3 | Content removal |
| 4 | Contract by using |
| 5 | Choice of law |
| 6 | Jurisdiction |
| 7 | Arbitration |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "Agreemind/deberta-unfair-tos-augmented"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "We reserve the right to terminate your account at any time."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.sigmoid(outputs.logits)
# Get predictions
labels = ["Limitation of liability", "Unilateral termination", "Unilateral change",
"Content removal", "Contract by using", "Choice of law", "Jurisdiction", "Arbitration"]
for label, prob in zip(labels, probs[0]):
if prob > 0.5:
print(f"{label}: {prob:.2%}")
Training
| Parameter | Value |
|---|---|
| Dataset | coastalcph/lex_glue (unfair_tos subset) |
| Training Samples | ~5,532 |
| Loss Function | Focal Loss with class weighting |
| Optimizer | AdamW with cosine LR schedule |
| Learning Rate | 2e-5 with 10% warmup |
| Epochs | 15 (with early stopping, patience=3) |
Limitations
- Arbitration class has lower recall (~38%) due to limited training samples
- Optimized for English legal text
Citation
@misc{agreemind-unfair-tos,
author = {Agreemind},
title = {deberta-unfair-tos-augmented},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/Agreemind/deberta-unfair-tos-augmented}
}
- Downloads last month
- 66
Model tree for Agreemind/deberta-unfair-tos-augmented
Base model
microsoft/deberta-base