---
title: FastVLM Private
emoji: 🔒
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.44.0
pinned: false
license: apache-2.0
hardware: zero-gpu
---

# 🔒 FastVLM Private REST API

Private Flask server running Apple's FastVLM-0.5B on ZeroGPU H200 for real-time vision analysis.

## Configuration
- **Visibility**: Public
- **Hardware**: ZeroGPU H200 (140GB VRAM)
- **Model**: apple/FastVLM-0.5B
- **SDK**: Docker (Flask server)
- **Port**: 7860

## API Endpoints

### Health Check
```bash
GET /health
```

**Response:**
```json
{
  "status": "healthy",
  "model": "apple/FastVLM-0.5B",
  "device": "cuda",
  "space": "NeuralThyself/fastvlm-private"
}
```

### Analyze Image
```bash
POST /analyze
Content-Type: application/json

{
  "imageBase64": "base64_encoded_image_string",
  "prompt": "Analyze this screenshot and describe what you see."
}
```

**Response:**
```json
{
  "analysis": {
    "raw_output": "The image shows...",
    "elements": ["Button", "Text field", ...],
    "hierarchy": {...},
    "spatial_info": {...},
    "detected_issues": [...],
    "code_suggestions": [...]
  }
}
```

## Usage Example

```typescript
// Direct REST API call (no queuing)
const response = await fetch(
  'https://neuralthyself-fastvlm-private.hf.space/analyze',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      imageBase64: base64Image,
      prompt: "Analyze this screenshot"
    })
  }
);

const data = await response.json();
console.log(data.analysis);
```

## Technical Details

- **Framework**: Flask 3.0.0
- **Inference Engine**: PyTorch 2.4.0 + transformers 4.46.3
- **GPU Allocation**: ZeroGPU decorator (`@spaces.GPU(duration=10)`)
- **Cold Start**: 30-60s (model download + GPU allocation)
- **Warm Inference**: 2-5s per request

---

**Note:** First request after idle may take 30-60 seconds. Subsequent requests are fast (2-5s).