AmitMY commited on
Commit
0d6bc7b
·
verified ·
1 Parent(s): 7e3321c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -24,6 +24,39 @@ Using [this](https://github.com/sign/utf8-tokenizer/blob/main/experiments/langua
24
 
25
  The repository includes the joined model for ease of use, and the [bit_projection_weights.pt](https://huggingface.co/sign/utf8-lm-tiny/blob/main/bit_projection_weights.pt) for further analysis.
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ## Training procedure
28
 
29
  ```shell
 
24
 
25
  The repository includes the joined model for ease of use, and the [bit_projection_weights.pt](https://huggingface.co/sign/utf8-lm-tiny/blob/main/bit_projection_weights.pt) for further analysis.
26
 
27
+ ## Usage
28
+
29
+ ```python
30
+ from transformers import AutoModelForCausalLM
31
+ import torch
32
+
33
+ from utf8_tokenizer import UTF8Tokenizer
34
+
35
+ model_id = "sign/utf8-lm-tiny"
36
+
37
+ tokenizer = UTF8Tokenizer()
38
+ model = AutoModelForCausalLM.from_pretrained(model_id)
39
+
40
+ prompt = "My name is"
41
+
42
+ inputs = tokenizer([prompt], return_tensors="pt",
43
+ padding=True,
44
+ add_special_tokens=True)
45
+ inputs["input_ids"] = inputs["input_ids"].to(torch.long)
46
+ # We need to remove the EOS token
47
+ inputs["input_ids"] = inputs["input_ids"][:, :-1]
48
+ inputs["attention_mask"] = inputs["attention_mask"][:, :-1]
49
+
50
+
51
+ with torch.no_grad():
52
+ out = model.generate(
53
+ **inputs,
54
+ max_new_tokens=64,
55
+ )
56
+
57
+ print(tokenizer.decode(out[0], skip_special_tokens=False))
58
+ ```
59
+
60
  ## Training procedure
61
 
62
  ```shell