transhumanist-already-exists
/

tereshchenkoblue-tokenizer

@@ -24,8 +24,6 @@ pretty_name: “gemma-3 - ukrainized gemma tokenizer”
     <img src="tereshchenkoblue.png" width="300px" style="margin-left:'auto' margin-right:'auto' display:'block'" caption=""/>
     <figcaption><a ref="https://en.wikipedia.org/wiki/Tereshchenko_diamond">Tereshchenko Blue is the second biggest blue diamond in the world</a></figcaption>
 </figure>
-<!--
-![ alt text](<image_url> “Your image caption”) -->
 ### By adding more than 80K Ukrainian tokens **without removing any English or EU languages tokens**, Tereshchenko Blue makes Ukrainian the core language in the multilingual Gemma-3 tokenizer while keeping the vocabulary fixed at its original size of 256K tokens.
@@ -58,7 +56,7 @@ Roughly four-fifths of tokens in scripts geographically and culturally distant f
 |Tibetan|107|26|
 |Oriya|100|25|
 |Cyrillic|13398|0|
-|Gemma-3 \<unused-*\>|6500|102|
 ## Feature Overview:

     <img src="tereshchenkoblue.png" width="300px" style="margin-left:'auto' margin-right:'auto' display:'block'" caption=""/>
     <figcaption><a ref="https://en.wikipedia.org/wiki/Tereshchenko_diamond">Tereshchenko Blue is the second biggest blue diamond in the world</a></figcaption>
 </figure>
 ### By adding more than 80K Ukrainian tokens **without removing any English or EU languages tokens**, Tereshchenko Blue makes Ukrainian the core language in the multilingual Gemma-3 tokenizer while keeping the vocabulary fixed at its original size of 256K tokens.
 |Tibetan|107|26|
 |Oriya|100|25|
 |Cyrillic|13398|0|
+|Gemma-3 \<unused-*\>|6139|102|
 ## Feature Overview: