deberta-unfair-tos-augmented

Best performing model - DeBERTa trained with augmented data for UNFAIR-ToS classification

Model Description

This model is fine-tuned on the LexGLUE UNFAIR-ToS dataset to detect unfair clauses in Terms of Service documents.

Base Model: microsoft/deberta-base

Performance

Evaluation Metrics:

  • Exact Match Accuracy: Percentage of samples where all predicted labels exactly match ground truth (strict multi-label metric)
  • Micro-F1: Harmonic mean of precision and recall, aggregated across all labels
Metric Score
Exact Match Accuracy 94.12%
Micro-F1 0.96
Micro-Precision 0.98

Risk Categories

The model classifies text into 8 risk categories:

ID Category
0 Limitation of liability
1 Unilateral termination
2 Unilateral change
3 Content removal
4 Contract by using
5 Choice of law
6 Jurisdiction
7 Arbitration

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "Agreemind/deberta-unfair-tos-augmented"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "We reserve the right to terminate your account at any time."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits)

# Get predictions
labels = ["Limitation of liability", "Unilateral termination", "Unilateral change", 
          "Content removal", "Contract by using", "Choice of law", "Jurisdiction", "Arbitration"]
          
for label, prob in zip(labels, probs[0]):
    if prob > 0.5:
        print(f"{label}: {prob:.2%}")

Training

Parameter Value
Dataset coastalcph/lex_glue (unfair_tos subset)
Training Samples ~5,532
Loss Function Focal Loss with class weighting
Optimizer AdamW with cosine LR schedule
Learning Rate 2e-5 with 10% warmup
Epochs 15 (with early stopping, patience=3)

Limitations

  • Arbitration class has lower recall (~38%) due to limited training samples
  • Optimized for English legal text

Citation

@misc{agreemind-unfair-tos,
  author = {Agreemind},
  title = {deberta-unfair-tos-augmented},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Agreemind/deberta-unfair-tos-augmented}
}
Downloads last month
66
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Agreemind/deberta-unfair-tos-augmented

Finetuned
(66)
this model

Dataset used to train Agreemind/deberta-unfair-tos-augmented