language:
- th
- en
metrics:
- comet
- kendall's tau correlation
tags:
- translation-evaluation
- translation-metrics
- mqm
- ranking
- translation-quality
model-index:
- name: COMET-Kiwi
results:
- task:
type: translation-quality-estimation
name: English-Thai Translation Quality Assessment
dataset:
type: MEET-MR/MEET-MR
name: MEET-MR
metrics:
- name: mqm correlation
type: Kendall's tau correlation
value: 0.402
verified: false
- name: rank correlation
type: Kendall's tau correlation
value: 0.415
verified: false
datasets:
- MEET-MR/MEET-MR
base_model:
- Unbabel/wmt22-cometkiwi-da
COMET-kiwi is a reference-free Quality Estimation (QE) model for the English-Thai language pair. It is based on the COMET-kiwi architecture and has been fine-tuned on the MEET-MR dataset to align with human judgments of translation quality.
Model Description
The model was fine-tuned on the MEET-MR dataset, comprising 2,142 English source sentences and their translations across 9 diverse domains. Fine-tuning COMET-kiwi on this specific language pair and dataset significantly improves its ability to capture Thai vocabulary, contextual nuances, and human preferences compared to the generic pretrained version.
This model is designed to estimate the quality of English-to-Thai machine translations without using reference translations. Given a source text and its translation, outputs a single score between 0 and 1 where 1 represents a perfect translation.
Paper
TBA
Usage
from comet import download_model, load_from_checkpoint
# Load the model (assuming you have the checkpoint file)
model_path = download_model("MEET-MR/COMET-Kiwi-MEET-MR")
model = load_from_checkpoint(model_path)
data = [
{
"src": "The premises of the mission shall be inviolable.",
"mt": "สถานที่ของภารกิจจะต้องไม่ถูกละเมิด",
"ref": "อาคารและสถานที่ของคณะผู้แทนจะถูกละเมิดมิได้"
},
{
"src": "A hydrating day & night cream.",
"mt": "ครีมน้ำในวันและคืน",
"ref": "ครีมให้ความชุ่มชื้นสำหรับกลางวันและกลางคืน"
}
]
model_output = model.predict(data, batch_size=8, gpus=1)
print(model_output)