NagaNLP POS Tagger (XLM-RoBERTa)

NagaNLP-POS is a Part-of-Speech (POS) tagging model fine-tuned on Nagamese (Naga Pidgin). It is built on top of XLM-RoBERTa Base and achieves an F1-score of 0.91.

This model is part of the NagaNLP project, aimed at developing foundational resources for the low-resource languages of Nagaland.

Model Details

  • Developer: Agniva Maiti
  • Base Architecture: XLM-RoBERTa Base
  • Task: Token Classification (POS Tagging)
  • Language: Nagamese (nag)
  • Training Data: ~214 annotated sentences (CoNLL format).

Performance

The model was evaluated on a held-out test set (10% split):

  • F1 Score: 0.9127
  • Accuracy: ~99% (on validation set)
  • Validation Loss: 0.77

How to Use

You can use this model directly with the Hugging Face pipeline:

from transformers import pipeline

# Load the pipeline
# Note: Aggregation strategy 'simple' merges sub-tokens into words
pos_pipeline = pipeline(
    "token-classification",
    model="agnivamaiti/naganlp-pos-annotated-corpus",
    aggregation_strategy="simple"
)

# Inference
text = "moi etiya school jai ase."
results = pos_pipeline(text)

# Print results
for entity in results:
    print(f"{entity['word']}: {entity['entity_group']} ({entity['score']:.2f})")

Output:

moi: PRON (0.22)
etiya: ADV (0.52)
school: NOUN (0.92)
jai ase: VERB (0.95)
.: PUNCT (0.95)

Training Details

  • Epochs: 10
  • Batch Size: 16
  • Learning Rate: 3e-5
  • Optimizer: AdamW
  • Precision: Float32

Limitations

  • Data Scarcity: Trained on a small corpus (~200 sentences). While it performs well on simple sentences, it may struggle with complex grammatical structures or rare vocabulary.
  • Code-Switching: Nagamese often mixes with English/Assamese. This model is optimized for standard Nagamese text.

Citation

If you use this model, please cite the associated NagaNLP research paper: Citation details to be added.

Downloads last month
2
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agnivamaiti/naganlp-pos-tagger

Finetuned
(3884)
this model

Dataset used to train agnivamaiti/naganlp-pos-tagger

Space using agnivamaiti/naganlp-pos-tagger 1

Collection including agnivamaiti/naganlp-pos-tagger