irfanh0926's picture
Upload 13 files
64fab7f verified
# Parallel-T5-Translation-PyTorch
## Project Title and Introduction
**Parallel-T5-Translation-PyTorch** is a custom, optimized Transformer-based sequence-to-sequence model inspired by **T5-Small**, developed for **English-to-French Machine Translation**.
The core innovation in this project is the **Parallel Multi-Head Attention** mechanism, designed to enable experimentation with **model parallelism** and improve **attention efficiency**. This implementation provides a foundation for studying how attention heads can be executed concurrently to enhance performance in translation tasks.
---
## Custom Model Architecture: Parallel Attention
### Overview
Our model, **ParallelT5Small**, replaces the standard Multi-Head Attention (MHA) with a **novel Parallel Multi-Head Attention (P-MHA)** layer.
- **Standard MHA:**
Computes one set of **Query (Q)**, **Key (K)**, and **Value (V)** projections, then splits the resulting vectors across all heads.
- **Parallel MHA (Proposed):**
Splits the attention mechanism into **two parallel streams**, each using separate **Q/K/V projection weights** for half of the attention heads.
The results from both parallel streams are **independently projected** back to the hidden dimension and then **summed** to form the final attention output.
### Goal
This architecture serves as a foundation for:
- Exploring **architectural variants** of the Transformer.
- Studying the **effects of parallelized attention** on translation performance.
- Investigating **scalability** in distributed training and **efficiency** on specialized hardware (e.g., GPUs or TPUs).
---
<h2>Model Architecture</h2>
<p align="center">
<img src="Assets/Architecture_diagram.jpg" alt="Parallel T5 Architecture" width="100%">
</p>
---
<h2>Training & Evaluation Metrics (Epoch 37)</h2>
<table>
<thead>
<tr>
<th>Metric</th>
<th>Train Result (Epoch 37)</th>
<th>Validation Result (Epoch 37)</th>
<th>Goal</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Loss (Cross-Entropy)</strong></td>
<td>4.2213</td>
<td>4.8907</td>
<td>Decrease loss below <strong>2.0</strong></td>
</tr>
<tr>
<td><strong>Token Accuracy</strong></td>
<td>&#8776; 18.18%</td>
<td>&#8776; 15.20%</td>
<td>Achieve <strong>60%+</strong></td>
</tr>
<tr>
<td><strong>BLEU Score</strong></td>
<td><em>To be implemented</em></td>
<td><em>To be implemented</em></td>
<td>Target: <strong>30–40</strong></td>
</tr>
</tbody>
</table>
---
## Installation and Setup
### Installation
To set up the project locally, follow these steps.
**Python 3.8+** is required.
---
#### 1Clone the Repository
```bash
git clone https://github.com/YourUsername/Parallel-T5-Translation-PyTorch.git
cd Parallel-T5-Translation-PyTorch
```
```bash
conda create -n parallel-t5 python=3.9
conda activate parallel-t5
```
```bash
# Install PyTorch (use the appropriate CUDA version for your setup)
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
```
```bash
# Install project dependencies
pip install -r requirements.txt
```
## Training and Preprocessing
The workflow consists of two main steps: **data preparation** and **model training**.
---
### Step 1: Data Preprocessing
This step performs the following:
- Downloads the **GlobalVoices EN-FR** dataset
- Tokenizes data using the **T5 tokenizer**
- Splits into **train**, **validation**, and **test** sets
- Saves processed tensors to `./data/processed`
**Run the preprocessing:**
```bash
python run.py
```
---
## References
This project is built upon the foundational work of the **T5 model** and utilizes the publicly available **GlobalVoices dataset**.
### 🔹 T5 (Text-to-Text Transfer Transformer)
The model architecture is heavily inspired by the **T5 framework**, which casts all NLP problems into a text-to-text format.
**Paper:**
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
*Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (2019).*
**Link:** [https://arxiv.org/abs/1910.10683](https://arxiv.org/abs/1910.10683)
### 🔹 OPUS - GlobalVoices Dataset
The parallel English-French data used for training is sourced from the **OPUS collection’s GlobalVoices corpus**.
**Resource:**
Jörg Tiedemann. *The OPUS Parallel Corpus (2012).*
**Link (GlobalVoices source):**
[https://object.pouta.csc.fi/OPUS-GlobalVoices/v2018q4/moses/en-fr.txt.zip](https://object.pouta.csc.fi/OPUS-GlobalVoices/v2018q4/moses/en-fr.txt.zip)
### 🔹 Hugging Face Transformers Library
The **AutoTokenizer** and several best practices for transformer training and dataset handling are derived from the **Hugging Face** ecosystem.
**Library:**
[https://huggingface.co/docs/transformers/index](https://huggingface.co/docs/transformers/index)
---