sign
/

utf8-lm-tiny

@@ -24,6 +24,39 @@ Using [this](https://github.com/sign/utf8-tokenizer/blob/main/experiments/langua
 The repository includes the joined model for ease of use, and the [bit_projection_weights.pt](https://huggingface.co/sign/utf8-lm-tiny/blob/main/bit_projection_weights.pt) for further analysis.
 ## Training procedure
 ```shell

 The repository includes the joined model for ease of use, and the [bit_projection_weights.pt](https://huggingface.co/sign/utf8-lm-tiny/blob/main/bit_projection_weights.pt) for further analysis.
+## Usage
+```python
+from transformers import AutoModelForCausalLM
+import torch
+from utf8_tokenizer import UTF8Tokenizer
+model_id = "sign/utf8-lm-tiny"
+tokenizer = UTF8Tokenizer()
+model = AutoModelForCausalLM.from_pretrained(model_id)
+prompt = "My name is"
+inputs = tokenizer([prompt], return_tensors="pt",
+                   padding=True,
+                   add_special_tokens=True)
+inputs["input_ids"] = inputs["input_ids"].to(torch.long)
+# We need to remove the EOS token
+inputs["input_ids"] = inputs["input_ids"][:, :-1]
+inputs["attention_mask"] = inputs["attention_mask"][:, :-1]
+with torch.no_grad():
+    out = model.generate(
+        **inputs,
+        max_new_tokens=64,
+    )
+print(tokenizer.decode(out[0], skip_special_tokens=False))
+```
 ## Training procedure
 ```shell