Model Description
This model is finetuned version of BERT multilingual base model
Dataset
The model finetuned on the GreekNews-20k dataset.
Results
Perfomance on the GreekNews-20k dataset :
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Αυτοκίνητο | 0.84 | 0.94 | 0.89 | 201 |
| Επιχειρήσεις και βιομηχανία | 0.62 | 0.75 | 0.68 | 369 |
| Έγκλημα και δικαιοσύνη | 0.83 | 0.89 | 0.86 | 314 |
| Ειδήσεις για καταστροφές και έκτακτες ανάγκες | 0.87 | 0.67 | 0.76 | 272 |
| Οικονομικά και χρηματοοικονομικά | 0.70 | 0.69 | 0.70 | 495 |
| Εκπαίδευση | 0.90 | 0.86 | 0.88 | 259 |
| Ψυχαγωγία και πολιτισμός | 0.74 | 0.71 | 0.72 | 251 |
| Περιβάλλον και κλίμα | 0.66 | 0.76 | 0.71 | 292 |
| Οικογένεια και σχέσεις | 0.95 | 0.66 | 0.78 | 294 |
| Μόδα | 0.74 | 0.94 | 0.83 | 259 |
| Τρόφιμα και ποτά | 0.69 | 0.79 | 0.74 | 262 |
| Υγεία και ιατρική | 0.74 | 0.63 | 0.68 | 346 |
| Μεταφορές και υποδομές | 0.76 | 0.80 | 0.78 | 321 |
| Ψυχική υγεία και ευεξία | 0.68 | 0.78 | 0.72 | 348 |
| Πολιτική και κυβέρνηση | 0.82 | 0.63 | 0.71 | 339 |
| Θρησκεία | 0.94 | 0.86 | 0.89 | 271 |
| Αθλητισμός | 0.99 | 0.95 | 0.97 | 212 |
| Ταξίδια και αναψυχή | 0.77 | 0.83 | 0.80 | 424 |
| Τεχνολογία και επιστήμη | 0.79 | 0.68 | 0.73 | 308 |
| accuracy | 0.77 | 5837 | ||
| macro avg | 0.79 | 0.78 | 0.78 | 5837 |
| weighted avg | 0.78 | 0.77 | 0.77 | 5837 |
| Entity | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| CARDINAL | 0.89 | 0.94 | 0.91 | 16065 |
| DATE | 0.88 | 0.92 | 0.90 | 10273 |
| EVENT | 0.66 | 0.55 | 0.60 | 1288 |
| FAC | 0.45 | 0.43 | 0.44 | 1348 |
| GPE | 0.83 | 0.91 | 0.87 | 10325 |
| LOC | 0.76 | 0.60 | 0.67 | 2197 |
| MONEY | 0.80 | 0.81 | 0.81 | 2359 |
| NORP | 0.90 | 0.85 | 0.88 | 1305 |
| ORDINAL | 0.93 | 0.97 | 0.95 | 2629 |
| ORG | 0.75 | 0.79 | 0.77 | 14768 |
| PERCENT | 0.77 | 0.79 | 0.78 | 4523 |
| PERSON | 0.88 | 0.89 | 0.88 | 10915 |
| PRODUCT | 0.63 | 0.48 | 0.54 | 1346 |
| QUANTITY | 0.70 | 0.68 | 0.69 | 1636 |
| TIME | 0.78 | 0.86 | 0.82 | 1602 |
| micro avg | 0.82 | 0.85 | 0.84 | 127262 |
| macro avg | 0.77 | 0.76 | 0.77 | 127262 |
| weighted avg | 0.82 | 0.85 | 0.83 | 127262 |
Performance on the elNER dataset :
| Entity | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| CARDINAL | 0.92 | 0.95 | 0.93 | 911 |
| DATE | 0.90 | 0.90 | 0.90 | 838 |
| EVENT | 0.61 | 0.43 | 0.50 | 130 |
| FAC | 0.40 | 0.35 | 0.37 | 77 |
| GPE | 0.81 | 0.90 | 0.85 | 826 |
| LOC | 0.85 | 0.62 | 0.72 | 178 |
| MONEY | 0.96 | 0.95 | 0.95 | 111 |
| NORP | 0.92 | 0.82 | 0.86 | 141 |
| ORDINAL | 0.95 | 0.91 | 0.93 | 172 |
| ORG | 0.75 | 0.73 | 0.74 | 1388 |
| PERCENT | 0.98 | 0.96 | 0.97 | 206 |
| PERSON | 0.90 | 0.90 | 0.90 | 1051 |
| PRODUCT | 0.58 | 0.37 | 0.46 | 83 |
| QUANTITY | 0.77 | 0.78 | 0.78 | 65 |
| TIME | 0.88 | 0.85 | 0.86 | 137 |
| micro avg | 0.85 | 0.84 | 0.84 | 6314 |
| macro avg | 0.81 | 0.76 | 0.78 | 6314 |
| weighted avg | 0.84 | 0.84 | 0.84 | 6314 |
To use this model
pip install transformers, torch
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("katrjohn/mBertGreekNews", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("bert-base-multilingual-cased")
Example usage
import torch
# Classification label dictionary (reverse)
classification_label_dict_reverse = {
0: "Αυτοκίνητο", 1: "Επιχειρήσεις και βιομηχανία", 2: "Έγκλημα και δικαιοσύνη",
3: "Ειδήσεις για καταστροφές και έκτακτες ανάγκες", 4: "Οικονομικά και χρηματοοικονομικά", 5: "Εκπαίδευση",
6: "Ψυχαγωγία και πολιτισμός", 7: "Περιβάλλον και κλίμα", 8: "Οικογένεια και σχέσεις",
9: "Μόδα", 10: "Τρόφιμα και ποτά", 11: "Υγεία και ιατρική", 12: "Μεταφορές και υποδομές",
13: "Ψυχική υγεία και ευεξία", 14: "Πολιτική και κυβέρνηση", 15: "Θρησκεία",
16: "Αθλητισμός", 17: "Ταξίδια και αναψυχή", 18: "Τεχνολογία και επιστήμη"
}
ner_label_set = ["PAD", "O",
"B-ORG", "I-ORG", "B-PERSON", "I-PERSON", "B-CARDINAL", "I-CARDINAL",
"B-GPE", "I-GPE", "B-DATE", "I-DATE", "B-ORDINAL", "I-ORDINAL",
"B-PERCENT", "I-PERCENT", "B-LOC", "I-LOC", "B-NORP", "I-NORP",
"B-MONEY", "I-MONEY", "B-TIME", "I-TIME", "B-EVENT", "I-EVENT",
"B-PRODUCT", "I-PRODUCT", "B-FAC", "I-FAC", "B-QUANTITY", "I-QUANTITY"
]
tag2idx = {t:i for i,t in enumerate(ner_label_set)}
idx2tag = {i:t for t,i in tag2idx.items()}
sentence = "Ο Κυριάκος Μητσοτάκης επισκέφθηκε τη Θεσσαλονίκη για τα εγκαίνια της ΔΕΘ."
inputs = tokenizer(sentence, return_tensors="pt")
with torch.no_grad():
classification_logits, ner_logits = model(**inputs)
# Classification
classification_probs = torch.softmax(classification_logits, dim=-1)
predicted_class = torch.argmax(classification_probs, dim=-1).item()
predicted_class_label = classification_label_dict_reverse.get(predicted_class, "Unknown")
print(f"Predicted class index: {predicted_class}")
print(f"Predicted class label: {predicted_class_label}")
# NER
ner_predictions = torch.argmax(ner_logits, dim=-1).squeeze().tolist()
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'].squeeze())
for token, pred_idx in zip(tokens, ner_predictions):
tag = idx2tag.get(pred_idx, "O")
if token in ["[CLS]", "[SEP]"]:
tag = "O"
print(f"{token}: {tag}")
Output:
Predicted class index: 14
Predicted class label: Πολιτική και κυβέρνηση
[CLS]: O
Ο: O
Κ: B-PERSON
##υ: B-PERSON
##ριά: B-PERSON
##κος: B-PERSON
Μ: I-PERSON
##η: I-PERSON
##τ: I-PERSON
##σο: I-PERSON
##τά: I-PERSON
##κης: I-PERSON
ε: O
##π: O
##ι: O
##σ: O
##κ: O
##έ: O
##φθηκε: O
τη: O
Θ: B-GPE
##ε: B-GPE
##σσα: B-GPE
##λο: I-ORG
##νίκη: I-ORG
για: O
τα: O
ε: O
##γκ: O
##α: O
##ί: O
##νια: O
της: O
Δ: B-ORG
##Ε: I-ORG
##Θ: I-ORG
.: O
[SEP]: O
Author
This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach.
To use this model please cite the following:
@ARTICLE{11148234,
author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo},
journal={IEEE Access},
title={Named Entity Recognition and News Article Classification: A Lightweight Approach},
year={2025},
volume={13},
number={},
pages={155031-155046},
keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition},
doi={10.1109/ACCESS.2025.3605709}}
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for katrjohn/mBertGreekNews
Base model
google-bert/bert-base-multilingual-cased