Spaces:

NeuralThyself
/

fastvlm-private

Running

App Files Files Community

fastvlm-private / README.md

NeuralThyself

🔧 Fix: Downgrade Gradio 5.9.1 → 4.44.0 (remove SSR)

ffb63d9 2 months ago

preview code

raw

history blame contribute delete

1.92 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: FastVLM Private
emoji: 🔒
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.44.0
pinned: false
license: apache-2.0
hardware: zero-gpu

🔒 FastVLM Private REST API

Private Flask server running Apple's FastVLM-0.5B on ZeroGPU H200 for real-time vision analysis.

Configuration

Visibility: Public
Hardware: ZeroGPU H200 (140GB VRAM)
Model: apple/FastVLM-0.5B
SDK: Docker (Flask server)
Port: 7860

API Endpoints

Health Check

GET /health

Response:

{
  "status": "healthy",
  "model": "apple/FastVLM-0.5B",
  "device": "cuda",
  "space": "NeuralThyself/fastvlm-private"
}

Analyze Image

POST /analyze
Content-Type: application/json

{
  "imageBase64": "base64_encoded_image_string",
  "prompt": "Analyze this screenshot and describe what you see."
}

Response:

{
  "analysis": {
    "raw_output": "The image shows...",
    "elements": ["Button", "Text field", ...],
    "hierarchy": {...},
    "spatial_info": {...},
    "detected_issues": [...],
    "code_suggestions": [...]
  }
}

Usage Example

// Direct REST API call (no queuing)
const response = await fetch(
  'https://neuralthyself-fastvlm-private.hf.space/analyze',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      imageBase64: base64Image,
      prompt: "Analyze this screenshot"
    })
  }
);

const data = await response.json();
console.log(data.analysis);

Technical Details

Framework: Flask 3.0.0
Inference Engine: PyTorch 2.4.0 + transformers 4.46.3
GPU Allocation: ZeroGPU decorator (@spaces.GPU(duration=10))
Cold Start: 30-60s (model download + GPU allocation)
Warm Inference: 2-5s per request

Note: First request after idle may take 30-60 seconds. Subsequent requests are fast (2-5s).