| # Parallel-T5-Translation-PyTorch | |
| ## Project Title and Introduction | |
| **Parallel-T5-Translation-PyTorch** is a custom, optimized Transformer-based sequence-to-sequence model inspired by **T5-Small**, developed for **English-to-French Machine Translation**. | |
| The core innovation in this project is the **Parallel Multi-Head Attention** mechanism, designed to enable experimentation with **model parallelism** and improve **attention efficiency**. This implementation provides a foundation for studying how attention heads can be executed concurrently to enhance performance in translation tasks. | |
| --- | |
| ## Custom Model Architecture: Parallel Attention | |
| ### Overview | |
| Our model, **ParallelT5Small**, replaces the standard Multi-Head Attention (MHA) with a **novel Parallel Multi-Head Attention (P-MHA)** layer. | |
| - **Standard MHA:** | |
| Computes one set of **Query (Q)**, **Key (K)**, and **Value (V)** projections, then splits the resulting vectors across all heads. | |
| - **Parallel MHA (Proposed):** | |
| Splits the attention mechanism into **two parallel streams**, each using separate **Q/K/V projection weights** for half of the attention heads. | |
| The results from both parallel streams are **independently projected** back to the hidden dimension and then **summed** to form the final attention output. | |
| ### Goal | |
| This architecture serves as a foundation for: | |
| - Exploring **architectural variants** of the Transformer. | |
| - Studying the **effects of parallelized attention** on translation performance. | |
| - Investigating **scalability** in distributed training and **efficiency** on specialized hardware (e.g., GPUs or TPUs). | |
| --- | |
| <h2>Model Architecture</h2> | |
| <p align="center"> | |
| <img src="Assets/Architecture_diagram.jpg" alt="Parallel T5 Architecture" width="100%"> | |
| </p> | |
| --- | |
| <h2>Training & Evaluation Metrics (Epoch 37)</h2> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th>Metric</th> | |
| <th>Train Result (Epoch 37)</th> | |
| <th>Validation Result (Epoch 37)</th> | |
| <th>Goal</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr> | |
| <td><strong>Loss (Cross-Entropy)</strong></td> | |
| <td>4.2213</td> | |
| <td>4.8907</td> | |
| <td>Decrease loss below <strong>2.0</strong></td> | |
| </tr> | |
| <tr> | |
| <td><strong>Token Accuracy</strong></td> | |
| <td>≈ 18.18%</td> | |
| <td>≈ 15.20%</td> | |
| <td>Achieve <strong>60%+</strong></td> | |
| </tr> | |
| <tr> | |
| <td><strong>BLEU Score</strong></td> | |
| <td><em>To be implemented</em></td> | |
| <td><em>To be implemented</em></td> | |
| <td>Target: <strong>30–40</strong></td> | |
| </tr> | |
| </tbody> | |
| </table> | |
| --- | |
| ## Installation and Setup | |
| ### Installation | |
| To set up the project locally, follow these steps. | |
| **Python 3.8+** is required. | |
| --- | |
| #### 1Clone the Repository | |
| ```bash | |
| git clone https://github.com/YourUsername/Parallel-T5-Translation-PyTorch.git | |
| cd Parallel-T5-Translation-PyTorch | |
| ``` | |
| ```bash | |
| conda create -n parallel-t5 python=3.9 | |
| conda activate parallel-t5 | |
| ``` | |
| ```bash | |
| # Install PyTorch (use the appropriate CUDA version for your setup) | |
| pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 | |
| ``` | |
| ```bash | |
| # Install project dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ## Training and Preprocessing | |
| The workflow consists of two main steps: **data preparation** and **model training**. | |
| --- | |
| ### Step 1: Data Preprocessing | |
| This step performs the following: | |
| - Downloads the **GlobalVoices EN-FR** dataset | |
| - Tokenizes data using the **T5 tokenizer** | |
| - Splits into **train**, **validation**, and **test** sets | |
| - Saves processed tensors to `./data/processed` | |
| **Run the preprocessing:** | |
| ```bash | |
| python run.py | |
| ``` | |
| --- | |
| ## References | |
| This project is built upon the foundational work of the **T5 model** and utilizes the publicly available **GlobalVoices dataset**. | |
| ### 🔹 T5 (Text-to-Text Transfer Transformer) | |
| The model architecture is heavily inspired by the **T5 framework**, which casts all NLP problems into a text-to-text format. | |
| **Paper:** | |
| Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. | |
| *Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (2019).* | |
| **Link:** [https://arxiv.org/abs/1910.10683](https://arxiv.org/abs/1910.10683) | |
| ### 🔹 OPUS - GlobalVoices Dataset | |
| The parallel English-French data used for training is sourced from the **OPUS collection’s GlobalVoices corpus**. | |
| **Resource:** | |
| Jörg Tiedemann. *The OPUS Parallel Corpus (2012).* | |
| **Link (GlobalVoices source):** | |
| [https://object.pouta.csc.fi/OPUS-GlobalVoices/v2018q4/moses/en-fr.txt.zip](https://object.pouta.csc.fi/OPUS-GlobalVoices/v2018q4/moses/en-fr.txt.zip) | |
| ### 🔹 Hugging Face Transformers Library | |
| The **AutoTokenizer** and several best practices for transformer training and dataset handling are derived from the **Hugging Face** ecosystem. | |
| **Library:** | |
| [https://huggingface.co/docs/transformers/index](https://huggingface.co/docs/transformers/index) | |
| --- |