Update README.md

4142013 verified 5 months ago

2.51 kB

language:
  - th
  - en
metrics:
  - comet
  - kendall's tau correlation
tags:
  - translation-evaluation
  - translation-metrics
  - mqm
  - ranking
  - translation-quality
model-index:
  - name: COMET-Kiwi
    results:
      - task:
          type: translation-quality-estimation
          name: English-Thai Translation Quality Assessment
        dataset:
          type: MEET-MR/MEET-MR
          name: MEET-MR
        metrics:
          - name: mqm correlation
            type: Kendall's tau correlation
            value: 0.402
            verified: false
          - name: rank correlation
            type: Kendall's tau correlation
            value: 0.415
            verified: false
datasets:
  - MEET-MR/MEET-MR
base_model:
  - Unbabel/wmt22-cometkiwi-da

COMET-kiwi is a reference-free Quality Estimation (QE) model for the English-Thai language pair. It is based on the COMET-kiwi architecture and has been fine-tuned on the MEET-MR dataset to align with human judgments of translation quality.

Model Description

The model was fine-tuned on the MEET-MR dataset, comprising 2,142 English source sentences and their translations across 9 diverse domains. Fine-tuning COMET-kiwi on this specific language pair and dataset significantly improves its ability to capture Thai vocabulary, contextual nuances, and human preferences compared to the generic pretrained version.

This model is designed to estimate the quality of English-to-Thai machine translations without using reference translations. Given a source text and its translation, outputs a single score between 0 and 1 where 1 represents a perfect translation.

Paper

TBA

Usage

from comet import download_model, load_from_checkpoint

# Load the model (assuming you have the checkpoint file)
model_path = download_model("MEET-MR/COMET-Kiwi-MEET-MR")
model = load_from_checkpoint(model_path)

data = [
    {
        "src": "The premises of the mission shall be inviolable.",
        "mt": "สถานที่ของภารกิจจะต้องไม่ถูกละเมิด",
        "ref": "อาคารและสถานที่ของคณะผู้แทนจะถูกละเมิดมิได้"
    },
    {
        "src": "A hydrating day & night cream.",
        "mt": "ครีมน้ำในวันและคืน",
        "ref": "ครีมให้ความชุ่มชื้นสำหรับกลางวันและกลางคืน"
    }
]

model_output = model.predict(data, batch_size=8, gpus=1)
print(model_output)