MedGemma 1.5 4B - Mobile Optimized (ONNX/ORT)
This repository provides Google's MedGemma 1.5 4B (Multimodal), quantized to INT4 and converted to the ONNX Runtime (.ort) format. It is specifically tuned for on-device medical image-to-text analysis and reasoning on mobile devices (Android/iOS).
π Mobile Optimizations
Unlike standard weights, this version is designed to fit within the RAM constraints of modern smartphones.
- Quantization: Language Decoder (INT4), Embeddings (8-bit UINT8), and Vision Tower (8-bit UINT8).
- Format: Optimized
.ort(Flatbuffers) to enable Memory Mapping (mmap), preventing Out-of-Memory (OOM) crashes. - Size: Reduced from
7.1GB (unoptimized ONNX) to **3.3GB total bundle size**.
π Memory Requirements
Optimized for high-performance inference on consumer-grade mobile hardware:
- Model Loading: ~3.3 - 4 GB RAM
- Active Inference: +500MB - 1GB (context-dependent)
- Recommended Hardware: Devices with 6GB+ RAM (e.g., iPhone 15 Pro, Pixel 8, or high-end Android tablets).
β οΈ Limitations
- Context Window: Capped at 2048 tokens (compared to 128K in the full model) to ensure memory stability on mobile devices.
- Quantization: Uses INT4 precision. While this significantly reduces size, there may be a slight degradation in diagnostic accuracy compared to the FP16/FP32 original.
- Mobile Only: Optimized for ARM-based CPUs (Android/iOS). Performance on desktop GPUs is not the primary focus.
- No Streaming: Currently returns the complete response at once (no token-by-token streaming in the current implementation).
π οΈ Export Details
This model was exported and optimized using the ONNX Runtime GenAI model builder pipeline.
python -m onnxruntime_genai.models.builder \
-m google/medgemma-1.5-4b \
-o medgemma_onnx_mobile \
-p int4 \
-e cpu
The resulting .onnx files were further converted to .ort format for mobile memory management.
π± Usage (Flutter)
This model is intended for use with the flutter_onnxruntime_genai package.
Text-Only Example
final result = await onnx.runInferenceWithConfigAsync(
modelPath: '/path/to/model',
prompt: 'What are the symptoms of hypertension?',
providers: ['XNNPACK'],
);
Multimodal Example (Vision)
Medical images should be normalized to 896x896 as specified in processor_config.json.
final visionResult = await onnx.runInferenceWithConfigAsync(
modelPath: '/path/to/model',
prompt: 'Analyze this medical image: <image>',
imagePath: '/path/to/scan.jpg',
providers: ['XNNPACK'],
);
βοΈ License
Apache 2.0 (Inherited from the base Google MedGemma model).
π Citation
If you use this model, please cite both the original MedGemma paper and this mobile optimization:
Original MedGemma Model:
@article{sellergren2025medgemma,
title={MedGemma Technical Report},
author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, CΓan and Lau, Charles and others},
journal={arXiv preprint arXiv:2507.05201},
year={2025}
}
This ONNX Conversion:
@misc{medgemma-onnx-mobile,
title={MedGemma 1.5 ONNX INT4 for Mobile},
author={EL OUBAYDI, Soufiane},
year={2026},
publisher={Hugging Face},
howpublished={\url{[https://huggingface.co/eloubaydi/medgemma-1.5-ort-standard](https://huggingface.co/eloubaydi/medgemma-1.5-ort-standard)}}
}
π€ Acknowledgments
Base Model: Google MedGemma
Quantization: ONNX Runtime GenAI
Mobile Runtime: flutter_onnxruntime_genai
π Support
Model/Weights Issues: Open an issue on this repository.
ONNX Runtime: onnxruntime-genai issues
Flutter Plugin: flutter_onnxruntime_genai issues
Disclaimer:
This model is for research and development only. It is not a certified medical device and should not be used for primary diagnosis or clinical decision-making.
Model tree for eloubaydi/medgemma-1.5-ort-standard
Base model
google/medgemma-1.5-4b-it