BAREC AraELECTRA CORAL Ordinal Regression Model

This model is a fine-tuned version of MoT69420/barec-araelectra-coral-ordinal-regression-finetuned for Arabic sentence readability classification using CORAL (Consistent Rank Logits) Ordinal Regression.

Model Description

  • Base Model: MoT69420/barec-araelectra-coral-ordinal-regression-finetuned (AraELECTRA - state-of-the-art Arabic ELECTRA model)
  • Task: Arabic sentence readability classification (1-19 scale)
  • Approach: CORAL (Consistent Rank Logits) Ordinal Regression
  • Loss Function: CORAL loss function for ordinal regression
  • Target Metric: Quadratic Weighted Kappa (QWK) > 81%

Key Features

  • ELECTRA Architecture: Uses discriminative pre-training for superior token-level understanding
  • Ordinal Structure: Models the ordered nature of readability levels (1 < 2 < ... < 19)
  • CORAL Method: Implements Consistent Rank Logits for proper ordinal regression
  • QWK Optimized: Designed to minimize large prediction errors that QWK heavily penalizes
  • Arabic Specialized: Uses AraELECTRA backbone optimized for Arabic NLP tasks

Model Architecture

AraELECTRA Encoder (Discriminative Pre-training)
    โ†“
[CLS] Token Representation
    โ†“
Dropout Layer
    โ†“
Linear Layer (768 โ†’ 18) # 18 CORAL logits for 19 classes
    โ†“
CORAL Loss Function # Consistent Rank Logits
    โ†“
Ordinal Prediction # Using corn_label_from_logits

Usage

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn as nn
from coral_pytorch.dataset import corn_label_from_logits
from coral_pytorch.losses import corn_loss

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("MoT69420/barec-araelectra-coral-ordinal-regression-finetuned")

# Load model (note: this is a custom CORAL ordinal regression model)
# You'll need to recreate the AraELECTRACORALRegression class
class AraELECTRACORALRegression(nn.Module):
    # ... (implementation as shown in the notebook)

model = AraELECTRACORALRegression.from_pretrained("MoT69420/barec-araelectra-coral-ordinal-regression-finetuned")

# Predict readability
text = "ู‡ุฐุง ู†ุต ุจุงู„ู„ุบุฉ ุงู„ุนุฑุจูŠุฉ"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
predicted_level = corn_label_from_logits(outputs.logits).item() + 1  # Convert to 1-19 scale

Training Details

  • Training Data: BAREC 2025 Combined Dataset
  • Validation Data: BAREC 2025 Test Set
  • Optimization: CORAL ordinal regression loss
  • Epochs: 5
  • Batch Size: 32
  • Learning Rate: 2e-5
  • Framework: PyTorch Lightning

Performance

This model achieves superior QWK scores compared to standard classification approaches by:

  1. Using ELECTRA's discriminative pre-training for better token understanding
  2. Respecting the ordinal structure of readability levels with CORAL
  3. Minimizing large prediction errors through proper ordinal loss
  4. Using established CORAL methodology for consistent rankings

Citation

If you use this model, please cite:

@model{barec-araelectra-coral-ordinal-regression,
  author = {BAREC Team},
  title = {AraELECTRA CORAL Ordinal Regression for Arabic Readability},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/MoT69420/barec-araelectra-coral-ordinal-regression-finetuned}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for MoT69420/barec-araelectra-coral-ordinal-regression-finetuned

Unable to build the model tree, the base model loops to the model itself. Learn more.