Model Card for lora-gemma327b-xllora-pos
This model is a fine-tuned version of google/gemma-3-27b-it. It has been trained using TRL.
This repository contains the XL-LoRA positive adapter used in the paper:
Bootstrapping Embeddings for Low Resource Languages
The adapter is designed for synthetic triplet generation in multilingual embedding training pipelines.
It is not merged with the base model and should be applied to the base model Gemma 3 27B during inference.
Model Details
| Property | Value |
|---|---|
| Base model | Gemma 3 27B |
| Method | XL-LoRA |
| Adapter type | LoRA |
| Purpose | Synthetic positive generation |
The adapter is part of the XL-LoRA methodology for generating multilingual contrastive training data.
Intended Use
This adapter is used to generate synthetic training data for multilingual sentence embedding models.
Specifically, it is used to generate:
| Adapter | Purpose |
|---|---|
| xllora-pos | Generate positive examples |
| xllora-neg | Generate hard negative examples |
These examples are then used to construct triplet datasets:
(anchor, positive, hard_negative)
for training sentence embedding models.
Usage
The adapter must be loaded together with the Gemma 3 27B base model using the PEFT library.
Example:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "google/gemma-3-27b"
adapter_model = "mbasoz/lora-gemma327b-xllora-pos"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, adapter_model)
Data Synthesis
Synthetic triplets are generated using the script:
src/generate_answers_mgpu_orch.py
from the official code repository:
https://github.com/mbasoz/xllora-embedding
Example scripts for generating data:
Negative generation
scripts/data_synthesis_neg.sh
Positive generation
scripts/data_synthesis_pos.sh
These scripts demonstrate how the adapter is used to generate multilingual triplet data.
Training procedure
The XL-LoRA adapters were trained using:
src/lora_training.py
Example training commands are provided in:
Negative adapter training
scripts/xllora_train_negative.sh
Positive adapter training
scripts/xllora_train_positive.sh
This model was trained with SFT.
Related Resources
- Paper: Bootstrapping Embeddings for Low Resource Languages
- Code: https://github.com/mbasoz/xllora-embedding
- Synthetic dataset: https://huggingface.co/datasets/mbasoz/xllora-datasets
- Training datasets: https://github.com/mbasoz/xllora-embedding/blob/main/data/mixed_parallel_xnli_14l_opusmt_10k_fin_neg.csv
and
Framework versions
- PEFT 0.15.2
- TRL: 0.19.0
- Transformers: 4.53.1
- Pytorch: 2.6.0+cu126
- Datasets: 3.1.0
- Tokenizers: 0.21.2
Citations
If you use these adapters in your research, please cite the following paper:
@article{basoz2026bootstrappingembeddings,
title={Bootstrapping Embeddings for Low Resource Languages},
author={Merve Basoz and Andrew Horne and Mattia Opper},
year={2026},
eprint={2603.01732},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.01732},
note={Accepted to the LoResLM Workshop at EACL 2026}
}
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
License
This model is released under the MIT License.
- Downloads last month
- 2