🇰🇭 Khmer Sentiment Analysis using XLM-RoBERTa

This model is a fine-tuned XLM-RoBERTa model for sentiment classification.
It is designed mainly for Khmer text sentiment analysis, but it can also process English text due to the multilingual pretraining of XLM-RoBERTa.

📌 Model Details

Base Model: XLM-RoBERTa (FacebookAI/xlm-roberta-base)
Architecture: Transformer Encoder for Sequence Classification
Task: Sentiment Analysis
Supported Languages:
- Khmer (Primary 🇰🇭)
- English (Partial 🇬🇧)
Labels:
- 0 → negative
- 1 → positive

Model Description

This model is fine-tuned on a Khmer sentiment dataset using XLM-RoBERTa.
It leverages multilingual pretraining, allowing it to process both Khmer and English inputs. However, performance is optimized for Khmer text.

How to Use

Install dependencies

pip install transformers torch

Run inference

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "phonsobon/khmer-sentiment-xlm-roberta"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

labels = {
    0: "negative",
    1: "positive"
}

text = "សេវាកម្មល្អណាស់"

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)

pred = torch.argmax(outputs.logits, dim=1).item()
print("Text:", text)
print("Prediction:", labels[pred])

Downloads last month: 4

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for phonsobon/khmer-sentiment-xlm-roberta

Base model

FacebookAI/xlm-roberta-base

Finetuned

(4008)

this model