BAREC AraELECTRA CORAL Ordinal Regression Model
This model is a fine-tuned version of MoT69420/barec-araelectra-coral-ordinal-regression-finetuned for Arabic sentence readability classification using CORAL (Consistent Rank Logits) Ordinal Regression.
Model Description
- Base Model: MoT69420/barec-araelectra-coral-ordinal-regression-finetuned (AraELECTRA - state-of-the-art Arabic ELECTRA model)
- Task: Arabic sentence readability classification (1-19 scale)
- Approach: CORAL (Consistent Rank Logits) Ordinal Regression
- Loss Function: CORAL loss function for ordinal regression
- Target Metric: Quadratic Weighted Kappa (QWK) > 81%
Key Features
- ELECTRA Architecture: Uses discriminative pre-training for superior token-level understanding
- Ordinal Structure: Models the ordered nature of readability levels (1 < 2 < ... < 19)
- CORAL Method: Implements Consistent Rank Logits for proper ordinal regression
- QWK Optimized: Designed to minimize large prediction errors that QWK heavily penalizes
- Arabic Specialized: Uses AraELECTRA backbone optimized for Arabic NLP tasks
Model Architecture
AraELECTRA Encoder (Discriminative Pre-training)
โ
[CLS] Token Representation
โ
Dropout Layer
โ
Linear Layer (768 โ 18) # 18 CORAL logits for 19 classes
โ
CORAL Loss Function # Consistent Rank Logits
โ
Ordinal Prediction # Using corn_label_from_logits
Usage
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn as nn
from coral_pytorch.dataset import corn_label_from_logits
from coral_pytorch.losses import corn_loss
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("MoT69420/barec-araelectra-coral-ordinal-regression-finetuned")
# Load model (note: this is a custom CORAL ordinal regression model)
# You'll need to recreate the AraELECTRACORALRegression class
class AraELECTRACORALRegression(nn.Module):
# ... (implementation as shown in the notebook)
model = AraELECTRACORALRegression.from_pretrained("MoT69420/barec-araelectra-coral-ordinal-regression-finetuned")
# Predict readability
text = "ูุฐุง ูุต ุจุงููุบุฉ ุงูุนุฑุจูุฉ"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
outputs = model(**inputs)
predicted_level = corn_label_from_logits(outputs.logits).item() + 1 # Convert to 1-19 scale
Training Details
- Training Data: BAREC 2025 Combined Dataset
- Validation Data: BAREC 2025 Test Set
- Optimization: CORAL ordinal regression loss
- Epochs: 5
- Batch Size: 32
- Learning Rate: 2e-5
- Framework: PyTorch Lightning
Performance
This model achieves superior QWK scores compared to standard classification approaches by:
- Using ELECTRA's discriminative pre-training for better token understanding
- Respecting the ordinal structure of readability levels with CORAL
- Minimizing large prediction errors through proper ordinal loss
- Using established CORAL methodology for consistent rankings
Citation
If you use this model, please cite:
@model{barec-araelectra-coral-ordinal-regression,
author = {BAREC Team},
title = {AraELECTRA CORAL Ordinal Regression for Arabic Readability},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/MoT69420/barec-araelectra-coral-ordinal-regression-finetuned}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for MoT69420/barec-araelectra-coral-ordinal-regression-finetuned
Unable to build the model tree, the base model loops to the model itself. Learn more.