Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: FastVLM Private
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.44.0
pinned: false
license: apache-2.0
hardware: zero-gpu
π FastVLM Private REST API
Private Flask server running Apple's FastVLM-0.5B on ZeroGPU H200 for real-time vision analysis.
Configuration
- Visibility: Public
- Hardware: ZeroGPU H200 (140GB VRAM)
- Model: apple/FastVLM-0.5B
- SDK: Docker (Flask server)
- Port: 7860
API Endpoints
Health Check
GET /health
Response:
{
"status": "healthy",
"model": "apple/FastVLM-0.5B",
"device": "cuda",
"space": "NeuralThyself/fastvlm-private"
}
Analyze Image
POST /analyze
Content-Type: application/json
{
"imageBase64": "base64_encoded_image_string",
"prompt": "Analyze this screenshot and describe what you see."
}
Response:
{
"analysis": {
"raw_output": "The image shows...",
"elements": ["Button", "Text field", ...],
"hierarchy": {...},
"spatial_info": {...},
"detected_issues": [...],
"code_suggestions": [...]
}
}
Usage Example
// Direct REST API call (no queuing)
const response = await fetch(
'https://neuralthyself-fastvlm-private.hf.space/analyze',
{
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
imageBase64: base64Image,
prompt: "Analyze this screenshot"
})
}
);
const data = await response.json();
console.log(data.analysis);
Technical Details
- Framework: Flask 3.0.0
- Inference Engine: PyTorch 2.4.0 + transformers 4.46.3
- GPU Allocation: ZeroGPU decorator (
@spaces.GPU(duration=10)) - Cold Start: 30-60s (model download + GPU allocation)
- Warm Inference: 2-5s per request
Note: First request after idle may take 30-60 seconds. Subsequent requests are fast (2-5s).