7 7 17

Catherine Arnett

catherinearnett

https://catherinearnett.github.io/

AI & ML interests

multilingual NLP, tokenization

Recent Activity

updated a collection 6 days ago

Global PIQA

liked a dataset 12 days ago

khaledyusuf44/somaliweb-v1

updated a Space 13 days ago

catherinearnett/multiblimp-leaderboard

View all activity

Organizations

Articles 7

Article

There is no such thing as a tokenizer-free lunch

Article

spaces 1

Multiblimp Leaderboard

🥇

Leaderboard for MultiBLiMP

models 21

catherinearnett/afr_Latn_als_Latn_9_91_bpe_nowhitespace_16384

98M • Updated May 14 • 55

catherinearnett/afr_Latn_als_Latn_50_50_bpe_nowhitespace_16384

98M • Updated May 14 • 32

catherinearnett/classical_armenian_goldfish

0.1B • Updated Apr 10 • 10

catherinearnett/B-GPT_pl_en_sequential

Text Generation • 0.1B • Updated Jun 12, 2025 • 7

catherinearnett/B-GPT_en_pl_sequential

Text Generation • 0.1B • Updated Jun 12, 2025 • 10

catherinearnett/B-GPT_pl_en_simultaneous

Text Generation • 0.1B • Updated Jun 12, 2025 • 4

catherinearnett/B-GPT_en_pl_simultaneous

Text Generation • 0.1B • Updated Jun 12, 2025 • 5

catherinearnett/B-GPT_el_en_sequential

Text Generation • 0.1B • Updated Jun 12, 2025 • 3

catherinearnett/B-GPT_en_el_sequential

Text Generation • 0.1B • Updated Jun 12, 2025 • 13

catherinearnett/B-GPT_el_en_simultaneous

Text Generation • 0.1B • Updated Jun 12, 2025 • 5

View 21 models

datasets 35

catherinearnett/apertus_multiblimp

Updated 18 days ago • 1.92k

catherinearnett/trilingual-tokenizers

Updated 26 days ago • 31

catherinearnett/trilingual-tokenizer-data

Updated 26 days ago • 31

catherinearnett/bilingual_tokenizers2

Updated May 11 • 2.7k

catherinearnett/bilingual_tokenizers

Updated May 10 • 8

catherinearnett/monolingual_tokenizers

Updated May 10 • 645

catherinearnett/low_resource_clean

Viewer • Updated Apr 30 • 1.74M • 4

catherinearnett/low_german

Viewer • Updated Apr 30 • 97.5k • 8

catherinearnett/komi_permyak

Viewer • Updated Apr 30 • 1.25k • 7

catherinearnett/komi_zyrian

Viewer • Updated Apr 30 • 22.5k • 8

View 35 datasets

Catherine Arnett

AI & ML interests

Recent Activity

Organizations

Articles 7

There is no such thing as a tokenizer-free lunch

An Analysis of Multilingual Models on Hugging Face

Collections 5

Papers 15

spaces 1

Multiblimp Leaderboard

models 21 Sort: Recently updated

datasets 35 Sort: Recently updated

models 21

datasets 35