MedGemma 1.5 4B - Mobile Optimized (ONNX/ORT)

This repository provides Google's MedGemma 1.5 4B (Multimodal), quantized to INT4 and converted to the ONNX Runtime (.ort) format. It is specifically tuned for on-device medical image-to-text analysis and reasoning on mobile devices (Android/iOS).


πŸš€ Mobile Optimizations

Unlike standard weights, this version is designed to fit within the RAM constraints of modern smartphones.

  • Quantization: Language Decoder (INT4), Embeddings (8-bit UINT8), and Vision Tower (8-bit UINT8).
  • Format: Optimized .ort (Flatbuffers) to enable Memory Mapping (mmap), preventing Out-of-Memory (OOM) crashes.
  • Size: Reduced from 7.1GB (unoptimized ONNX) to **3.3GB total bundle size**.

πŸ“Š Memory Requirements

Optimized for high-performance inference on consumer-grade mobile hardware:

  • Model Loading: ~3.3 - 4 GB RAM
  • Active Inference: +500MB - 1GB (context-dependent)
  • Recommended Hardware: Devices with 6GB+ RAM (e.g., iPhone 15 Pro, Pixel 8, or high-end Android tablets).

⚠️ Limitations

  • Context Window: Capped at 2048 tokens (compared to 128K in the full model) to ensure memory stability on mobile devices.
  • Quantization: Uses INT4 precision. While this significantly reduces size, there may be a slight degradation in diagnostic accuracy compared to the FP16/FP32 original.
  • Mobile Only: Optimized for ARM-based CPUs (Android/iOS). Performance on desktop GPUs is not the primary focus.
  • No Streaming: Currently returns the complete response at once (no token-by-token streaming in the current implementation).

πŸ› οΈ Export Details

This model was exported and optimized using the ONNX Runtime GenAI model builder pipeline.

python -m onnxruntime_genai.models.builder \
    -m google/medgemma-1.5-4b \
    -o medgemma_onnx_mobile \
    -p int4 \
    -e cpu

The resulting .onnx files were further converted to .ort format for mobile memory management.

πŸ“± Usage (Flutter)

This model is intended for use with the flutter_onnxruntime_genai package.

Text-Only Example

final result = await onnx.runInferenceWithConfigAsync(
  modelPath: '/path/to/model',
  prompt: 'What are the symptoms of hypertension?',
  providers: ['XNNPACK'], 
);

Multimodal Example (Vision)

Medical images should be normalized to 896x896 as specified in processor_config.json.

final visionResult = await onnx.runInferenceWithConfigAsync(
  modelPath: '/path/to/model',
  prompt: 'Analyze this medical image: <image>',
  imagePath: '/path/to/scan.jpg',
  providers: ['XNNPACK'],
);

βš–οΈ License

Apache 2.0 (Inherited from the base Google MedGemma model).

πŸ“ Citation

If you use this model, please cite both the original MedGemma paper and this mobile optimization:

Original MedGemma Model:

@article{sellergren2025medgemma,
  title={MedGemma Technical Report},
  author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, CΓ­an and Lau, Charles and others},
  journal={arXiv preprint arXiv:2507.05201},
  year={2025}
}

This ONNX Conversion:

@misc{medgemma-onnx-mobile,
  title={MedGemma 1.5 ONNX INT4 for Mobile},
  author={EL OUBAYDI, Soufiane},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{[https://huggingface.co/eloubaydi/medgemma-1.5-ort-standard](https://huggingface.co/eloubaydi/medgemma-1.5-ort-standard)}}
}

🀝 Acknowledgments

Base Model: Google MedGemma

Quantization: ONNX Runtime GenAI

Mobile Runtime: flutter_onnxruntime_genai

πŸ†˜ Support

Model/Weights Issues: Open an issue on this repository.

ONNX Runtime: onnxruntime-genai issues

Flutter Plugin: flutter_onnxruntime_genai issues

Disclaimer:

This model is for research and development only. It is not a certified medical device and should not be used for primary diagnosis or clinical decision-making.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for eloubaydi/medgemma-1.5-ort-standard

Quantized
(34)
this model

Paper for eloubaydi/medgemma-1.5-ort-standard