Model Description

This model is finetuned version of BERT multilingual base model

Dataset

The model finetuned on the GreekNews-20k dataset.

Results

Perfomance on the GreekNews-20k dataset :

Class	Precision	Recall	F1-Score	Support
Αυτοκίνητο	0.84	0.94	0.89	201
Επιχειρήσεις και βιομηχανία	0.62	0.75	0.68	369
Έγκλημα και δικαιοσύνη	0.83	0.89	0.86	314
Ειδήσεις για καταστροφές και έκτακτες ανάγκες	0.87	0.67	0.76	272
Οικονομικά και χρηματοοικονομικά	0.70	0.69	0.70	495
Εκπαίδευση	0.90	0.86	0.88	259
Ψυχαγωγία και πολιτισμός	0.74	0.71	0.72	251
Περιβάλλον και κλίμα	0.66	0.76	0.71	292
Οικογένεια και σχέσεις	0.95	0.66	0.78	294
Μόδα	0.74	0.94	0.83	259
Τρόφιμα και ποτά	0.69	0.79	0.74	262
Υγεία και ιατρική	0.74	0.63	0.68	346
Μεταφορές και υποδομές	0.76	0.80	0.78	321
Ψυχική υγεία και ευεξία	0.68	0.78	0.72	348
Πολιτική και κυβέρνηση	0.82	0.63	0.71	339
Θρησκεία	0.94	0.86	0.89	271
Αθλητισμός	0.99	0.95	0.97	212
Ταξίδια και αναψυχή	0.77	0.83	0.80	424
Τεχνολογία και επιστήμη	0.79	0.68	0.73	308
accuracy			0.77	5837
macro avg	0.79	0.78	0.78	5837
weighted avg	0.78	0.77	0.77	5837

Entity	Precision	Recall	F1-Score	Support
CARDINAL	0.89	0.94	0.91	16065
DATE	0.88	0.92	0.90	10273
EVENT	0.66	0.55	0.60	1288
FAC	0.45	0.43	0.44	1348
GPE	0.83	0.91	0.87	10325
LOC	0.76	0.60	0.67	2197
MONEY	0.80	0.81	0.81	2359
NORP	0.90	0.85	0.88	1305
ORDINAL	0.93	0.97	0.95	2629
ORG	0.75	0.79	0.77	14768
PERCENT	0.77	0.79	0.78	4523
PERSON	0.88	0.89	0.88	10915
PRODUCT	0.63	0.48	0.54	1346
QUANTITY	0.70	0.68	0.69	1636
TIME	0.78	0.86	0.82	1602
micro avg	0.82	0.85	0.84	127262
macro avg	0.77	0.76	0.77	127262
weighted avg	0.82	0.85	0.83	127262

Performance on the elNER dataset :

Entity	Precision	Recall	F1-Score	Support
CARDINAL	0.92	0.95	0.93	911
DATE	0.90	0.90	0.90	838
EVENT	0.61	0.43	0.50	130
FAC	0.40	0.35	0.37	77
GPE	0.81	0.90	0.85	826
LOC	0.85	0.62	0.72	178
MONEY	0.96	0.95	0.95	111
NORP	0.92	0.82	0.86	141
ORDINAL	0.95	0.91	0.93	172
ORG	0.75	0.73	0.74	1388
PERCENT	0.98	0.96	0.97	206
PERSON	0.90	0.90	0.90	1051
PRODUCT	0.58	0.37	0.46	83
QUANTITY	0.77	0.78	0.78	65
TIME	0.88	0.85	0.86	137
micro avg	0.85	0.84	0.84	6314
macro avg	0.81	0.76	0.78	6314
weighted avg	0.84	0.84	0.84	6314

To use this model

pip install transformers, torch

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("katrjohn/mBertGreekNews", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("bert-base-multilingual-cased")

Example usage

import torch

# Classification label dictionary (reverse)
classification_label_dict_reverse = {
    0: "Αυτοκίνητο", 1: "Επιχειρήσεις και βιομηχανία", 2: "Έγκλημα και δικαιοσύνη",
    3: "Ειδήσεις για καταστροφές και έκτακτες ανάγκες", 4: "Οικονομικά και χρηματοοικονομικά", 5: "Εκπαίδευση",
    6: "Ψυχαγωγία και πολιτισμός", 7: "Περιβάλλον και κλίμα", 8: "Οικογένεια και σχέσεις",
    9: "Μόδα", 10: "Τρόφιμα και ποτά", 11: "Υγεία και ιατρική", 12: "Μεταφορές και υποδομές",
    13: "Ψυχική υγεία και ευεξία", 14: "Πολιτική και κυβέρνηση", 15: "Θρησκεία",
    16: "Αθλητισμός", 17: "Ταξίδια και αναψυχή", 18: "Τεχνολογία και επιστήμη"
}

ner_label_set = ["PAD", "O",
    "B-ORG", "I-ORG", "B-PERSON", "I-PERSON", "B-CARDINAL", "I-CARDINAL",
    "B-GPE", "I-GPE", "B-DATE", "I-DATE", "B-ORDINAL", "I-ORDINAL",
    "B-PERCENT", "I-PERCENT", "B-LOC", "I-LOC", "B-NORP", "I-NORP",
    "B-MONEY", "I-MONEY", "B-TIME", "I-TIME", "B-EVENT", "I-EVENT",
    "B-PRODUCT", "I-PRODUCT", "B-FAC", "I-FAC", "B-QUANTITY", "I-QUANTITY"
]
tag2idx = {t:i for i,t in enumerate(ner_label_set)}
idx2tag = {i:t for t,i in tag2idx.items()}

sentence = "Ο Κυριάκος Μητσοτάκης επισκέφθηκε τη Θεσσαλονίκη για τα εγκαίνια της ΔΕΘ."
inputs = tokenizer(sentence, return_tensors="pt")

with torch.no_grad():
    classification_logits, ner_logits = model(**inputs)

# Classification
classification_probs = torch.softmax(classification_logits, dim=-1)
predicted_class = torch.argmax(classification_probs, dim=-1).item()
predicted_class_label = classification_label_dict_reverse.get(predicted_class, "Unknown")

print(f"Predicted class index: {predicted_class}")
print(f"Predicted class label: {predicted_class_label}")

# NER
ner_predictions = torch.argmax(ner_logits, dim=-1).squeeze().tolist()
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'].squeeze())

for token, pred_idx in zip(tokens, ner_predictions):
    tag = idx2tag.get(pred_idx, "O")
    if token in ["[CLS]", "[SEP]"]:
        tag = "O"
    print(f"{token}: {tag}")

Output:

Predicted class index: 14
Predicted class label: Πολιτική και κυβέρνηση
[CLS]: O
Ο: O
Κ: B-PERSON
##υ: B-PERSON
##ριά: B-PERSON
##κος: B-PERSON
Μ: I-PERSON
##η: I-PERSON
##τ: I-PERSON
##σο: I-PERSON
##τά: I-PERSON
##κης: I-PERSON
ε: O
##π: O
##ι: O
##σ: O
##κ: O
##έ: O
##φθηκε: O
τη: O
Θ: B-GPE
##ε: B-GPE
##σσα: B-GPE
##λο: I-ORG
##νίκη: I-ORG
για: O
τα: O
ε: O
##γκ: O
##α: O
##ί: O
##νια: O
της: O
Δ: B-ORG
##Ε: I-ORG
##Θ: I-ORG
.: O
[SEP]: O

Author

This model has been released along side with the article: Named Entity Recognition and News Article Classification: A Lightweight Approach.

To use this model please cite the following:

@ARTICLE{11148234,
  author={Katranis, Ioannis and Troussas, Christos and Krouska, Akrivi and Mylonas, Phivos and Sgouropoulou, Cleo},
  journal={IEEE Access}, 
  title={Named Entity Recognition and News Article Classification: A Lightweight Approach}, 
  year={2025},
  volume={13},
  number={},
  pages={155031-155046},
  keywords={Accuracy;Transformers;Pipelines;Named entity recognition;Computational modeling;Vocabulary;Tagging;Real-time systems;Benchmark testing;Training;Distilled transformer;edge-deployable model;multiclass news-topic classification;named entity recognition},
  doi={10.1109/ACCESS.2025.3605709}}

Downloads last month: 7

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for katrjohn/mBertGreekNews

Base model

google-bert/bert-base-multilingual-cased

Finetuned

(924)

this model

katrjohn
/

mBertGreekNews