--- title: FastVLM Private emoji: 🔒 colorFrom: blue colorTo: green sdk: gradio sdk_version: 4.44.0 pinned: false license: apache-2.0 hardware: zero-gpu --- # 🔒 FastVLM Private REST API Private Flask server running Apple's FastVLM-0.5B on ZeroGPU H200 for real-time vision analysis. ## Configuration - **Visibility**: Public - **Hardware**: ZeroGPU H200 (140GB VRAM) - **Model**: apple/FastVLM-0.5B - **SDK**: Docker (Flask server) - **Port**: 7860 ## API Endpoints ### Health Check ```bash GET /health ``` **Response:** ```json { "status": "healthy", "model": "apple/FastVLM-0.5B", "device": "cuda", "space": "NeuralThyself/fastvlm-private" } ``` ### Analyze Image ```bash POST /analyze Content-Type: application/json { "imageBase64": "base64_encoded_image_string", "prompt": "Analyze this screenshot and describe what you see." } ``` **Response:** ```json { "analysis": { "raw_output": "The image shows...", "elements": ["Button", "Text field", ...], "hierarchy": {...}, "spatial_info": {...}, "detected_issues": [...], "code_suggestions": [...] } } ``` ## Usage Example ```typescript // Direct REST API call (no queuing) const response = await fetch( 'https://neuralthyself-fastvlm-private.hf.space/analyze', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ imageBase64: base64Image, prompt: "Analyze this screenshot" }) } ); const data = await response.json(); console.log(data.analysis); ``` ## Technical Details - **Framework**: Flask 3.0.0 - **Inference Engine**: PyTorch 2.4.0 + transformers 4.46.3 - **GPU Allocation**: ZeroGPU decorator (`@spaces.GPU(duration=10)`) - **Cold Start**: 30-60s (model download + GPU allocation) - **Warm Inference**: 2-5s per request --- **Note:** First request after idle may take 30-60 seconds. Subsequent requests are fast (2-5s).