What is this model?

This is my first successful neural network model, essentially a fine-tuned DialoGPT-medium. To run it, you'll need to use the example from the DialoGPT-medium documentation. I've written this in advance so you don't have to worry about it:

Chat Example

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("hacer201145/Hasex0.1-355M")
model = AutoModelForCausalLM.from_pretrained("hacer201145/Hasex0.1-355M")

# Let's chat for 5 lines
for step in range(5):
  # encode the new user input, add the eos_token, and return a tensor in Pytorch
  new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

  # append the new user input tokens to the chat history
  bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

  # generated a response while limiting the total chat history to 1000 tokens,
  chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

  # pretty print last ouput tokens from bot
  print("Hasex: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

To run this code you will need Transformers module, Python 3 or higher. I'll say this: initially, I thought this model couldn't generate responses, and I'd broken the tokenizer or something, but after inserting the correct template, everything worked. Also, if you need an interactive version of this code, just create a proper prompt for ChatGPT, Deepseek, Gemini, or whatever you're using?

Fun facts

  • I initially didn't know what basis to use for the future model, but I came across DialoGPT-medium. It's strange that I didn't use the newer GODEL. If I were training it now, I'd probably use Gemma3-1b or Gemma3-4b.
  • All the models I train or quantize are made using Google Colab, but due to the change in the TPU RAM limit from 350GB to 12GB, I can't currently migrate new models to a different weight type or a better transformer architecture.
  • The model was named Hasex because I simply took my username and changed a letter; that's the name I liked.
  • I would never have used a large model as the basis for this one, not because of its size or resources, but because I'm against neural networks replacing developers. We must remember that they are tools, not replacements for humans.

P.S. I'm not responsible for the responses of this neural network model. If you were offended by something they said, we hope they won't say it later.

Downloads last month
6
Safetensors
Model size
0.4B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hacer201145/Hasex0.1-355M

Finetuned
(88)
this model
Quantizations
1 model

Space using hacer201145/Hasex0.1-355M 1