Multilingual Knowledge RAG Bot – Cross-Lingual Retrieval-Augmented Generation
This model is designed for cross-lingual question answering using Retrieval-Augmented Generation (RAG).
It can take documents in multiple languages — Urdu, Hindi, Spanish, English — and answer in the same or different language.
Key Features
- LLM Used: Meta-Llama-3-8B-Instruct
- Embedding Model:
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 - RAG Pipeline: FAISS-based vector search + context injection
- Training/Processing: Implemented entirely in Google Colab using open-source tools only
- Zero paid APIs — 100% free and deployable
Techniques Used
- Vector Database: FAISS for similarity search
- Cross-Lingual Embeddings: multilingual sentence transformers
- Prompt Engineering: Context-aware question answering
- Open-Source Deployment Ready: Hugging Face Spaces compatible
License
Apache-2.0