ada-flo/nlp-hack-debate-xlmr-lstm

Bilingual (English + Korean) LSTM seq2seq debate chatbot. The encoder is a frozen XLM-RoBERTa-base providing contextual hidden states; the decoder is an LSTM with Bahdanau attention. Trained on debate-shape (topic, PRO, CON) records plus discourse corpora for fluency.

Test metrics

  • BLEU: 3.128
  • Loss: 5.515
  • Perplexity: 248.4
  • n_eval: 5,443

Training history

Epoch Train loss Valid loss Valid PPL Valid BLEU
1 5.959 5.902 365.6 2.338
2 5.826 5.813 334.6 2.448
3 5.731 5.745 312.7 2.715
4 5.652 5.661 287.4 2.458
5 5.586 5.618 275.4 2.666

Architecture

  • Encoder: xlmr (frozen XLM-RoBERTa-base)
  • Decoder: LSTM with Bahdanau attention
  • Embed dim: 256, hidden dim: 512
  • Encoder layers: 2, decoder layers: 1
  • Tokenizer: SentencePiece, 32k shared bilingual vocab (ships as spm.model)

Training config

  • Init from: checkpoints/xlmr-1777297357/best.pt
  • Epochs: 5, batch size: 16, lr: 0.0003
  • Optimizer: Adam, label smoothing: 0.1
  • Max src/tgt length: 128/128

Files

  • best.pt โ€” model weights (state_dict saved as model_state inside the checkpoint dict)
  • spm.model โ€” SentencePiece tokenizer (32k shared bilingual vocab)
  • args.json โ€” full training config
  • history.json โ€” per-epoch validation metrics
  • test_metrics.json โ€” final held-out test metrics

Inference

Clone the source repo (https://github.com/ada-flo/nlp-hack), then:

import torch, sentencepiece as spm
from huggingface_hub import hf_hub_download
from src.model.lstm_seq2seq import Seq2Seq

ckpt_path = hf_hub_download("ada-flo/nlp-hack-debate-xlmr-lstm", "best.pt")
sp_path = hf_hub_download("ada-flo/nlp-hack-debate-xlmr-lstm", "spm.model")

sp = spm.SentencePieceProcessor()
sp.Load(sp_path)
ckpt = torch.load(ckpt_path, map_location="cuda", weights_only=False)

model = Seq2Seq(
    vocab_size=sp.get_piece_size(),
    embed_dim=256, hidden_dim=512,
    enc_layers=2, dec_layers=1,
    dropout=0.0, encoder_type="xlmr",
).cuda().eval()
model.load_state_dict(ckpt["model_state"], strict=False)

Data

Training data is published separately as a HF dataset. See the source repo (https://github.com/ada-flo/nlp-hack) for the preprocessing pipeline.

License

CC BY 4.0. Underlying corpora retain their original licenses; consult the source repo for details before commercial use.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support