MedGemma 1.5 4B - Mobile Optimized (ONNX/ORT)

This repository provides Google's MedGemma 1.5 4B (Multimodal), quantized to INT4 and converted to the ONNX Runtime (.ort) format. It is specifically tuned for on-device medical image-to-text analysis and reasoning on mobile devices (Android/iOS).

🚀 Mobile Optimizations

Unlike standard weights, this version is designed to fit within the RAM constraints of modern smartphones.

Quantization: Language Decoder (INT4), Embeddings (8-bit UINT8), and Vision Tower (8-bit UINT8).
Format: Optimized .ort (Flatbuffers) to enable Memory Mapping (mmap), preventing Out-of-Memory (OOM) crashes.
Size: Reduced from 7.1GB (unoptimized ONNX) to **3.3GB total bundle size**.

📊 Memory Requirements

Optimized for high-performance inference on consumer-grade mobile hardware:

Model Loading: ~3.3 - 4 GB RAM
Active Inference: +500MB - 1GB (context-dependent)
Recommended Hardware: Devices with 6GB+ RAM (e.g., iPhone 15 Pro, Pixel 8, or high-end Android tablets).

⚠️ Limitations

Context Window: Capped at 2048 tokens (compared to 128K in the full model) to ensure memory stability on mobile devices.
Quantization: Uses INT4 precision. While this significantly reduces size, there may be a slight degradation in diagnostic accuracy compared to the FP16/FP32 original.
Mobile Only: Optimized for ARM-based CPUs (Android/iOS). Performance on desktop GPUs is not the primary focus.
No Streaming: Currently returns the complete response at once (no token-by-token streaming in the current implementation).

🛠️ Export Details

This model was exported and optimized using the ONNX Runtime GenAI model builder pipeline.

python -m onnxruntime_genai.models.builder \
    -m google/medgemma-1.5-4b \
    -o medgemma_onnx_mobile \
    -p int4 \
    -e cpu

The resulting .onnx files were further converted to .ort format for mobile memory management.

📱 Usage (Flutter)

This model is intended for use with the flutter_onnxruntime_genai package.

Text-Only Example

final result = await onnx.runInferenceWithConfigAsync(
  modelPath: '/path/to/model',
  prompt: 'What are the symptoms of hypertension?',
  providers: ['XNNPACK'], 
);

Multimodal Example (Vision)

Medical images should be normalized to 896x896 as specified in processor_config.json.

final visionResult = await onnx.runInferenceWithConfigAsync(
  modelPath: '/path/to/model',
  prompt: 'Analyze this medical image: <image>',
  imagePath: '/path/to/scan.jpg',
  providers: ['XNNPACK'],
);

⚖️ License

Apache 2.0 (Inherited from the base Google MedGemma model).

📝 Citation

If you use this model, please cite both the original MedGemma paper and this mobile optimization:

Original MedGemma Model:

@article{sellergren2025medgemma,
  title={MedGemma Technical Report},
  author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, Cían and Lau, Charles and others},
  journal={arXiv preprint arXiv:2507.05201},
  year={2025}
}

This ONNX Conversion:

@misc{medgemma-onnx-mobile,
  title={MedGemma 1.5 ONNX INT4 for Mobile},
  author={EL OUBAYDI, Soufiane},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{[https://huggingface.co/eloubaydi/medgemma-1.5-ort-standard](https://huggingface.co/eloubaydi/medgemma-1.5-ort-standard)}}
}

🤝 Acknowledgments

Base Model: Google MedGemma

Quantization: ONNX Runtime GenAI

Mobile Runtime: flutter_onnxruntime_genai

🆘 Support

Model/Weights Issues: Open an issue on this repository.

ONNX Runtime: onnxruntime-genai issues

Flutter Plugin: flutter_onnxruntime_genai issues

Disclaimer:

This model is for research and development only. It is not a certified medical device and should not be used for primary diagnosis or clinical decision-making.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for eloubaydi/medgemma-1.5-ort-standard

Base model

google/medgemma-1.5-4b-it

Quantized

(34)

this model

Paper for eloubaydi/medgemma-1.5-ort-standard

MedGemma Technical Report

Paper • 2507.05201 • Published Jul 7, 2025 • 16