BAREC AraELECTRA CORAL Ordinal Regression Model

This model is a fine-tuned version of MoT69420/barec-araelectra-coral-ordinal-regression-finetuned for Arabic sentence readability classification using CORAL (Consistent Rank Logits) Ordinal Regression.

Model Description

Base Model: MoT69420/barec-araelectra-coral-ordinal-regression-finetuned (AraELECTRA - state-of-the-art Arabic ELECTRA model)
Task: Arabic sentence readability classification (1-19 scale)
Approach: CORAL (Consistent Rank Logits) Ordinal Regression
Loss Function: CORAL loss function for ordinal regression
Target Metric: Quadratic Weighted Kappa (QWK) > 81%

Key Features

ELECTRA Architecture: Uses discriminative pre-training for superior token-level understanding
Ordinal Structure: Models the ordered nature of readability levels (1 < 2 < ... < 19)
CORAL Method: Implements Consistent Rank Logits for proper ordinal regression
QWK Optimized: Designed to minimize large prediction errors that QWK heavily penalizes
Arabic Specialized: Uses AraELECTRA backbone optimized for Arabic NLP tasks

Model Architecture

AraELECTRA Encoder (Discriminative Pre-training)
    ↓
[CLS] Token Representation
    ↓
Dropout Layer
    ↓
Linear Layer (768 → 18) # 18 CORAL logits for 19 classes
    ↓
CORAL Loss Function # Consistent Rank Logits
    ↓
Ordinal Prediction # Using corn_label_from_logits

Usage

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn as nn
from coral_pytorch.dataset import corn_label_from_logits
from coral_pytorch.losses import corn_loss

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("MoT69420/barec-araelectra-coral-ordinal-regression-finetuned")

# Load model (note: this is a custom CORAL ordinal regression model)
# You'll need to recreate the AraELECTRACORALRegression class
class AraELECTRACORALRegression(nn.Module):
    # ... (implementation as shown in the notebook)

model = AraELECTRACORALRegression.from_pretrained("MoT69420/barec-araelectra-coral-ordinal-regression-finetuned")

# Predict readability
text = "هذا نص باللغة العربية"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
predicted_level = corn_label_from_logits(outputs.logits).item() + 1  # Convert to 1-19 scale

Training Details

Training Data: BAREC 2025 Combined Dataset
Validation Data: BAREC 2025 Test Set
Optimization: CORAL ordinal regression loss
Epochs: 5
Batch Size: 32
Learning Rate: 2e-5
Framework: PyTorch Lightning

Performance

This model achieves superior QWK scores compared to standard classification approaches by:

Using ELECTRA's discriminative pre-training for better token understanding
Respecting the ordinal structure of readability levels with CORAL
Minimizing large prediction errors through proper ordinal loss
Using established CORAL methodology for consistent rankings

Citation

If you use this model, please cite:

@model{barec-araelectra-coral-ordinal-regression,
  author = {BAREC Team},
  title = {AraELECTRA CORAL Ordinal Regression for Arabic Readability},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/MoT69420/barec-araelectra-coral-ordinal-regression-finetuned}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MoT69420/barec-araelectra-coral-ordinal-regression-finetuned

Unable to build the model tree, the base model loops to the model itself. Learn more.