transhumanist-already-exists commited on
Commit
33e9d06
·
verified ·
1 Parent(s): 0b68354

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -24,8 +24,6 @@ pretty_name: “gemma-3 - ukrainized gemma tokenizer”
24
  <img src="tereshchenkoblue.png" width="300px" style="margin-left:'auto' margin-right:'auto' display:'block'" caption=""/>
25
  <figcaption><a ref="https://en.wikipedia.org/wiki/Tereshchenko_diamond">Tereshchenko Blue is the second biggest blue diamond in the world</a></figcaption>
26
  </figure>
27
- <!--
28
- ![ alt text](<image_url> “Your image caption”) -->
29
 
30
  ### By adding more than 80K Ukrainian tokens **without removing any English or EU languages tokens**, Tereshchenko Blue makes Ukrainian the core language in the multilingual Gemma-3 tokenizer while keeping the vocabulary fixed at its original size of 256K tokens.
31
 
@@ -58,7 +56,7 @@ Roughly four-fifths of tokens in scripts geographically and culturally distant f
58
  |Tibetan|107|26|
59
  |Oriya|100|25|
60
  |Cyrillic|13398|0|
61
- |Gemma-3 \<unused-*\>|6500|102|
62
 
63
 
64
  ## Feature Overview:
 
24
  <img src="tereshchenkoblue.png" width="300px" style="margin-left:'auto' margin-right:'auto' display:'block'" caption=""/>
25
  <figcaption><a ref="https://en.wikipedia.org/wiki/Tereshchenko_diamond">Tereshchenko Blue is the second biggest blue diamond in the world</a></figcaption>
26
  </figure>
 
 
27
 
28
  ### By adding more than 80K Ukrainian tokens **without removing any English or EU languages tokens**, Tereshchenko Blue makes Ukrainian the core language in the multilingual Gemma-3 tokenizer while keeping the vocabulary fixed at its original size of 256K tokens.
29
 
 
56
  |Tibetan|107|26|
57
  |Oriya|100|25|
58
  |Cyrillic|13398|0|
59
+ |Gemma-3 \<unused-*\>|6139|102|
60
 
61
 
62
  ## Feature Overview: