Parallel-T5-Translation-PyTorch / README.md

Upload 13 files

64fab7f verified 2 months ago

5.16 kB

	# Parallel-T5-Translation-PyTorch

	## Project Title and Introduction

	Parallel-T5-Translation-PyTorch is a custom, optimized Transformer-based sequence-to-sequence model inspired by T5-Small, developed for English-to-French Machine Translation.

	The core innovation in this project is the Parallel Multi-Head Attention mechanism, designed to enable experimentation with model parallelism and improve attention efficiency. This implementation provides a foundation for studying how attention heads can be executed concurrently to enhance performance in translation tasks.

	---

	## Custom Model Architecture: Parallel Attention

	### Overview
	Our model, ParallelT5Small, replaces the standard Multi-Head Attention (MHA) with a novel Parallel Multi-Head Attention (P-MHA) layer.

	- Standard MHA:
	Computes one set of Query (Q), Key (K), and Value (V) projections, then splits the resulting vectors across all heads.

	- Parallel MHA (Proposed):
	Splits the attention mechanism into two parallel streams, each using separate Q/K/V projection weights for half of the attention heads.
	The results from both parallel streams are independently projected back to the hidden dimension and then summed to form the final attention output.

	### Goal
	This architecture serves as a foundation for:
	- Exploring architectural variants of the Transformer.
	- Studying the effects of parallelized attention on translation performance.
	- Investigating scalability in distributed training and efficiency on specialized hardware (e.g., GPUs or TPUs).

	---

	<h2>Model Architecture</h2>

	<p align="center">
	<img src="Assets/Architecture_diagram.jpg" alt="Parallel T5 Architecture" width="100%">
	</p>

	---

	<h2>Training & Evaluation Metrics (Epoch 37)</h2>

	<table>
	<thead>
	<tr>
	<th>Metric</th>
	<th>Train Result (Epoch 37)</th>
	<th>Validation Result (Epoch 37)</th>
	<th>Goal</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td><strong>Loss (Cross-Entropy)</strong></td>
	<td>4.2213</td>
	<td>4.8907</td>
	<td>Decrease loss below <strong>2.0</strong></td>
	</tr>
	<tr>
	<td><strong>Token Accuracy</strong></td>
	<td>≈ 18.18%</td>
	<td>≈ 15.20%</td>
	<td>Achieve <strong>60%+</strong></td>
	</tr>
	<tr>
	<td><strong>BLEU Score</strong></td>
	<td><em>To be implemented</em></td>
	<td><em>To be implemented</em></td>
	<td>Target: <strong>30–40</strong></td>
	</tr>
	</tbody>
	</table>


	---

	## Installation and Setup

	### Installation

	To set up the project locally, follow these steps.
	Python 3.8+ is required.

	---

	#### 1Clone the Repository

	```bash
	git clone https://github.com/YourUsername/Parallel-T5-Translation-PyTorch.git
	cd Parallel-T5-Translation-PyTorch

	```
	```bash
	conda create -n parallel-t5 python=3.9
	conda activate parallel-t5
	```

	```bash
	# Install PyTorch (use the appropriate CUDA version for your setup)
	pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
	```

	```bash
	# Install project dependencies
	pip install -r requirements.txt
	```


	## Training and Preprocessing

	The workflow consists of two main steps: data preparation and model training.

	---

	### Step 1: Data Preprocessing

	This step performs the following:

	- Downloads the GlobalVoices EN-FR dataset
	- Tokenizes data using the T5 tokenizer
	- Splits into train, validation, and test sets
	- Saves processed tensors to `./data/processed`

	Run the preprocessing:

	```bash
	python run.py
	```

	---
	## References

	This project is built upon the foundational work of the T5 model and utilizes the publicly available GlobalVoices dataset.

	### 🔹 T5 (Text-to-Text Transfer Transformer)

	The model architecture is heavily inspired by the T5 framework, which casts all NLP problems into a text-to-text format.

	Paper:
	Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (2019).

	Link: [https://arxiv.org/abs/1910.10683](https://arxiv.org/abs/1910.10683)


	### 🔹 OPUS - GlobalVoices Dataset

	The parallel English-French data used for training is sourced from the OPUS collection’s GlobalVoices corpus.

	Resource:
	Jörg Tiedemann. The OPUS Parallel Corpus (2012).

	Link (GlobalVoices source):
	[https://object.pouta.csc.fi/OPUS-GlobalVoices/v2018q4/moses/en-fr.txt.zip](https://object.pouta.csc.fi/OPUS-GlobalVoices/v2018q4/moses/en-fr.txt.zip)


	### 🔹 Hugging Face Transformers Library

	The AutoTokenizer and several best practices for transformer training and dataset handling are derived from the Hugging Face ecosystem.

	Library:
	[https://huggingface.co/docs/transformers/index](https://huggingface.co/docs/transformers/index)

	---