# Embedding Inference API

A FastAPI-based inference service for generating embeddings using JobBERT v2/v3, Jina AI, and Voyage AI.

## Features

- **Multiple Models**: JobBERT v2/v3 (job-specific), Jina AI v3 (general-purpose), Voyage AI (state-of-the-art)
- **RESTful API**: Easy-to-use HTTP endpoints
- **Batch Processing**: Process multiple texts in a single request
- **Task-Specific Embeddings**: Support for different embedding tasks (retrieval, classification, etc.)
- **Docker Ready**: Easy deployment to Hugging Face Spaces or any Docker environment

## Supported Models

| Model | Dimension | Max Tokens | Best For |
|-------|-----------|------------|----------|
| JobBERT v2 | 768 | 512 | Job titles and descriptions |
| JobBERT v3 | 768 | 512 | Job titles (improved performance) |
| Jina AI v3 | 1024 | 8,192 | General text, long documents |
| Voyage AI | 1024 | 32,000 | High-quality embeddings (requires API key) |

## Quick Start

### Local Development

1. **Install dependencies:**
   ```bash
   cd embedding
   pip install -r requirements.txt
   ```

2. **Run the API:**
   ```bash
   python api.py
   ```

3. **Access the API:**
   - API: http://localhost:7860
   - Docs: http://localhost:7860/docs

### Docker Deployment

1. **Build the image:**
   ```bash
   docker build -t embedding-api .
   ```

2. **Run the container:**
   ```bash
   docker run -p 7860:7860 embedding-api
   ```

3. **With Voyage AI (optional):**
   ```bash
   docker run -p 7860:7860 -e VOYAGE_API_KEY=your_key_here embedding-api
   ```

## Hugging Face Spaces Deployment

### Option 1: Using Hugging Face CLI

1. **Install Hugging Face CLI:**
   ```bash
   pip install huggingface_hub
   huggingface-cli login
   ```

2. **Create a new Space:**
   - Go to https://huggingface.co/spaces
   - Click "Create new Space"
   - Choose "Docker" as the Space SDK
   - Name your space (e.g., `your-username/embedding-api`)

3. **Clone and push:**
   ```bash
   git clone https://huggingface.co/spaces/your-username/embedding-api
   cd embedding-api
   
   # Copy files from embedding folder
   cp /path/to/embedding/Dockerfile .
   cp /path/to/embedding/api.py .
   cp /path/to/embedding/requirements.txt .
   cp /path/to/embedding/README.md .
   
   git add .
   git commit -m "Initial commit"
   git push
   ```

4. **Configure environment (optional):**
   - Go to your Space settings
   - Add `VOYAGE_API_KEY` secret if using Voyage AI

### Option 2: Manual Upload

1. Create a new Docker Space on Hugging Face
2. Upload these files:
   - `Dockerfile`
   - `api.py`
   - `requirements.txt`
   - `README.md`
3. Add environment variables in Settings if needed

## API Usage

### Health Check

```bash
curl http://localhost:7860/health
```

Response:
```json
{
  "status": "healthy",
  "models_loaded": ["jobbertv2", "jina"],
  "voyage_available": false
}
```

### Generate Embeddings

#### JobBERT v2 (Job Titles)

```bash
curl -X POST http://localhost:7860/embed \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["Software Engineer", "Data Scientist", "Product Manager"],
    "model": "jobbertv2"
  }'
```

#### JobBERT v3 (Latest, Recommended)

```bash
curl -X POST http://localhost:7860/embed \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["Software Engineer", "Data Scientist", "Product Manager"],
    "model": "jobbertv3"
  }'
```

#### Jina AI (with task specification)

```bash
curl -X POST http://localhost:7860/embed \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["What is machine learning?", "How does AI work?"],
    "model": "jina",
    "task": "retrieval.query"
  }'
```

**Jina AI Tasks:**
- `retrieval.query`: For search queries
- `retrieval.passage`: For documents
- `text-matching`: For similarity (default)
- `classification`: For classification
- `separation`: For clustering

#### Voyage AI (requires API key)

```bash
curl -X POST http://localhost:7860/embed \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["This is a document to embed"],
    "model": "voyage",
    "input_type": "document"
  }'
```

**Voyage AI Input Types:**
- `document`: For documents/passages
- `query`: For search queries

### Response Format

```json
{
  "embeddings": [
    [0.123, -0.456, 0.789, ...],
    [0.234, -0.567, 0.890, ...]
  ],
  "model": "jobbertv2",
  "dimension": 768,
  "num_texts": 2
}
```

### List Available Models

```bash
curl http://localhost:7860/models
```

## Python Client Example

```python
import requests

url = "http://localhost:7860/embed"

# JobBERT v3 (recommended)
response = requests.post(url, json={
    "texts": ["Software Engineer", "Data Scientist"],
    "model": "jobbertv3"
})
result = response.json()
embeddings = result["embeddings"]
print(f"Got {len(embeddings)} embeddings of dimension {result['dimension']}")

# JobBERT v2
response = requests.post(url, json={
    "texts": ["Product Manager"],
    "model": "jobbertv2"
})

# Jina AI with task
response = requests.post(url, json={
    "texts": ["What is Python?"],
    "model": "jina",
    "task": "retrieval.query"
})

# Voyage AI
response = requests.post(url, json={
    "texts": ["Document text here"],
    "model": "voyage",
    "input_type": "document"
})
```

## Environment Variables

- `PORT`: Server port (default: 7860)
- `VOYAGE_API_KEY`: Voyage AI API key (optional, required for Voyage embeddings)

## Interactive Documentation

Once the API is running, visit:
- **Swagger UI**: http://localhost:7860/docs
- **ReDoc**: http://localhost:7860/redoc

## Notes

- Models are downloaded automatically on first startup (~2-3GB total)
- Voyage AI requires an API key from https://www.voyageai.com/
- First request to each model may be slower due to model loading
- Use batch processing for better performance (send multiple texts at once)

## Troubleshooting

### Models not loading
- Check available disk space (need ~3GB)
- Ensure internet connection for model download
- Check logs for specific error messages

### Voyage AI not working
- Verify `VOYAGE_API_KEY` is set correctly
- Check API key has sufficient credits
- Ensure `voyageai` package is installed

### Out of memory
- Reduce batch size (process fewer texts per request)
- Use smaller models (JobBERT v2 instead of Jina)
- Increase container memory limits

## License

This API uses models with different licenses:
- JobBERT v2/v3: Apache 2.0
- Jina AI: Apache 2.0
- Voyage AI: Subject to Voyage AI terms of service