fastvlm-private / README.md
NeuralThyself's picture
πŸ”§ Fix: Downgrade Gradio 5.9.1 β†’ 4.44.0 (remove SSR)
ffb63d9

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: FastVLM Private
emoji: πŸ”’
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.44.0
pinned: false
license: apache-2.0
hardware: zero-gpu

πŸ”’ FastVLM Private REST API

Private Flask server running Apple's FastVLM-0.5B on ZeroGPU H200 for real-time vision analysis.

Configuration

  • Visibility: Public
  • Hardware: ZeroGPU H200 (140GB VRAM)
  • Model: apple/FastVLM-0.5B
  • SDK: Docker (Flask server)
  • Port: 7860

API Endpoints

Health Check

GET /health

Response:

{
  "status": "healthy",
  "model": "apple/FastVLM-0.5B",
  "device": "cuda",
  "space": "NeuralThyself/fastvlm-private"
}

Analyze Image

POST /analyze
Content-Type: application/json

{
  "imageBase64": "base64_encoded_image_string",
  "prompt": "Analyze this screenshot and describe what you see."
}

Response:

{
  "analysis": {
    "raw_output": "The image shows...",
    "elements": ["Button", "Text field", ...],
    "hierarchy": {...},
    "spatial_info": {...},
    "detected_issues": [...],
    "code_suggestions": [...]
  }
}

Usage Example

// Direct REST API call (no queuing)
const response = await fetch(
  'https://neuralthyself-fastvlm-private.hf.space/analyze',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      imageBase64: base64Image,
      prompt: "Analyze this screenshot"
    })
  }
);

const data = await response.json();
console.log(data.analysis);

Technical Details

  • Framework: Flask 3.0.0
  • Inference Engine: PyTorch 2.4.0 + transformers 4.46.3
  • GPU Allocation: ZeroGPU decorator (@spaces.GPU(duration=10))
  • Cold Start: 30-60s (model download + GPU allocation)
  • Warm Inference: 2-5s per request

Note: First request after idle may take 30-60 seconds. Subsequent requests are fast (2-5s).