# Embedding Inference API A FastAPI-based inference service for generating embeddings using JobBERT v2/v3, Jina AI, and Voyage AI. ## Features - **Multiple Models**: JobBERT v2/v3 (job-specific), Jina AI v3 (general-purpose), Voyage AI (state-of-the-art) - **RESTful API**: Easy-to-use HTTP endpoints - **Batch Processing**: Process multiple texts in a single request - **Task-Specific Embeddings**: Support for different embedding tasks (retrieval, classification, etc.) - **Docker Ready**: Easy deployment to Hugging Face Spaces or any Docker environment ## Supported Models | Model | Dimension | Max Tokens | Best For | |-------|-----------|------------|----------| | JobBERT v2 | 768 | 512 | Job titles and descriptions | | JobBERT v3 | 768 | 512 | Job titles (improved performance) | | Jina AI v3 | 1024 | 8,192 | General text, long documents | | Voyage AI | 1024 | 32,000 | High-quality embeddings (requires API key) | ## Quick Start ### Local Development 1. **Install dependencies:** ```bash cd embedding pip install -r requirements.txt ``` 2. **Run the API:** ```bash python api.py ``` 3. **Access the API:** - API: http://localhost:7860 - Docs: http://localhost:7860/docs ### Docker Deployment 1. **Build the image:** ```bash docker build -t embedding-api . ``` 2. **Run the container:** ```bash docker run -p 7860:7860 embedding-api ``` 3. **With Voyage AI (optional):** ```bash docker run -p 7860:7860 -e VOYAGE_API_KEY=your_key_here embedding-api ``` ## Hugging Face Spaces Deployment ### Option 1: Using Hugging Face CLI 1. **Install Hugging Face CLI:** ```bash pip install huggingface_hub huggingface-cli login ``` 2. **Create a new Space:** - Go to https://huggingface.co/spaces - Click "Create new Space" - Choose "Docker" as the Space SDK - Name your space (e.g., `your-username/embedding-api`) 3. **Clone and push:** ```bash git clone https://huggingface.co/spaces/your-username/embedding-api cd embedding-api # Copy files from embedding folder cp /path/to/embedding/Dockerfile . cp /path/to/embedding/api.py . cp /path/to/embedding/requirements.txt . cp /path/to/embedding/README.md . git add . git commit -m "Initial commit" git push ``` 4. **Configure environment (optional):** - Go to your Space settings - Add `VOYAGE_API_KEY` secret if using Voyage AI ### Option 2: Manual Upload 1. Create a new Docker Space on Hugging Face 2. Upload these files: - `Dockerfile` - `api.py` - `requirements.txt` - `README.md` 3. Add environment variables in Settings if needed ## API Usage ### Health Check ```bash curl http://localhost:7860/health ``` Response: ```json { "status": "healthy", "models_loaded": ["jobbertv2", "jina"], "voyage_available": false } ``` ### Generate Embeddings #### JobBERT v2 (Job Titles) ```bash curl -X POST http://localhost:7860/embed \ -H "Content-Type: application/json" \ -d '{ "texts": ["Software Engineer", "Data Scientist", "Product Manager"], "model": "jobbertv2" }' ``` #### JobBERT v3 (Latest, Recommended) ```bash curl -X POST http://localhost:7860/embed \ -H "Content-Type: application/json" \ -d '{ "texts": ["Software Engineer", "Data Scientist", "Product Manager"], "model": "jobbertv3" }' ``` #### Jina AI (with task specification) ```bash curl -X POST http://localhost:7860/embed \ -H "Content-Type: application/json" \ -d '{ "texts": ["What is machine learning?", "How does AI work?"], "model": "jina", "task": "retrieval.query" }' ``` **Jina AI Tasks:** - `retrieval.query`: For search queries - `retrieval.passage`: For documents - `text-matching`: For similarity (default) - `classification`: For classification - `separation`: For clustering #### Voyage AI (requires API key) ```bash curl -X POST http://localhost:7860/embed \ -H "Content-Type: application/json" \ -d '{ "texts": ["This is a document to embed"], "model": "voyage", "input_type": "document" }' ``` **Voyage AI Input Types:** - `document`: For documents/passages - `query`: For search queries ### Response Format ```json { "embeddings": [ [0.123, -0.456, 0.789, ...], [0.234, -0.567, 0.890, ...] ], "model": "jobbertv2", "dimension": 768, "num_texts": 2 } ``` ### List Available Models ```bash curl http://localhost:7860/models ``` ## Python Client Example ```python import requests url = "http://localhost:7860/embed" # JobBERT v3 (recommended) response = requests.post(url, json={ "texts": ["Software Engineer", "Data Scientist"], "model": "jobbertv3" }) result = response.json() embeddings = result["embeddings"] print(f"Got {len(embeddings)} embeddings of dimension {result['dimension']}") # JobBERT v2 response = requests.post(url, json={ "texts": ["Product Manager"], "model": "jobbertv2" }) # Jina AI with task response = requests.post(url, json={ "texts": ["What is Python?"], "model": "jina", "task": "retrieval.query" }) # Voyage AI response = requests.post(url, json={ "texts": ["Document text here"], "model": "voyage", "input_type": "document" }) ``` ## Environment Variables - `PORT`: Server port (default: 7860) - `VOYAGE_API_KEY`: Voyage AI API key (optional, required for Voyage embeddings) ## Interactive Documentation Once the API is running, visit: - **Swagger UI**: http://localhost:7860/docs - **ReDoc**: http://localhost:7860/redoc ## Notes - Models are downloaded automatically on first startup (~2-3GB total) - Voyage AI requires an API key from https://www.voyageai.com/ - First request to each model may be slower due to model loading - Use batch processing for better performance (send multiple texts at once) ## Troubleshooting ### Models not loading - Check available disk space (need ~3GB) - Ensure internet connection for model download - Check logs for specific error messages ### Voyage AI not working - Verify `VOYAGE_API_KEY` is set correctly - Check API key has sufficient credits - Ensure `voyageai` package is installed ### Out of memory - Reduce batch size (process fewer texts per request) - Use smaller models (JobBERT v2 instead of Jina) - Increase container memory limits ## License This API uses models with different licenses: - JobBERT v2/v3: Apache 2.0 - Jina AI: Apache 2.0 - Voyage AI: Subject to Voyage AI terms of service