File size: 1,406 Bytes
dfb6f4c
 
 
 
 
 
 
 
9b48adf
dfb6f4c
fcf84ad
 
 
506a498
fcf84ad
 
 
c818ba3
fcf84ad
9b48adf
 
 
 
 
 
506a498
9b48adf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
title: README
emoji: 🏆
colorFrom: purple
colorTo: blue
sdk: static
pinned: false
---
https://hplt-project.org/

Our project name, HPLT, is an acronym for High Performance Language Technologies. 
We combine large quantities of data, a number of languages and high-performance computing to build powerful and efficient datasets for language and translation models. 
Another goal of HPLT is to publish the results of this project in a shared space with open licenses.

- Version 3 of the HPLT datasets (198 languages):
  - https://hplt-project.org/datasets/v3.0
  - https://hf.co/datasets/HPLT/HPLT3.0
- [HPLT-E](https://github.com/hplt-project/hplt-e): a framework for comprehensive multilingual and multi-prompt k-shot evaluation in nine languages
- [HPLT datasets ACL'2025 paper](https://aclanthology.org/2025.acl-long.854/)
- Version 2 of the HPLT datasets (193 languages):
  - https://hplt-project.org/datasets/v2.0
  - https://hf.co/datasets/HPLT/HPLT2.0_cleaned
- Version 1.2 of the HPLT datasets (75 languages):
  - https://hplt-project.org/datasets/v1.2
  - https://huggingface.co/datasets/HPLT/hplt_monolingual_v1_2

*This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350 and from UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee [grant number 10052546]*