Spaces:

gnumanth
/

MedGemma-Symptoms

Running

App Files Files Community

gnumanth commited on Jul 5

Commit

d93af2a

verified ·

1 Parent(s): 70f8d8b

feat: use HF inference

Browse files

Files changed (2) hide show

README.md +166 -39
app.py +83 -197

README.md CHANGED Viewed

@@ -10,64 +10,191 @@ pinned: false
 license: apache-2.0
 ---
-# MedGemma Symptom Analyzer 🏥
-An AI-powered symptom analysis tool built with Google's MedGemma model and Gradio. This application provides preliminary medical insights based on symptom descriptions.
-## Features
-- **Symptom Analysis**: Enter symptoms and get AI-powered medical insights
-- **Differential Diagnosis**: Possible conditions based on presented symptoms
-- **Medical Recommendations**: Next steps and when to seek immediate care
-- **Interactive Interface**: User-friendly Gradio web interface
-- **Example Symptoms**: Pre-built examples to try the system
-## How to Use
-1. **Enter Symptoms**: Describe your symptoms in the text area
-2. **Adjust Settings**: Use the temperature slider to control response creativity
-3. **Analyze**: Click "Analyze Symptoms" to get medical insights
-4. **Review Results**: Read the AI-generated analysis and recommendations
-## Important Disclaimers
-⚠️ **This tool is for educational purposes only and should not replace professional medical advice.**
-- Always consult with healthcare professionals for medical concerns
-- Seek immediate medical attention for severe or emergency symptoms
-- The AI may not always provide accurate medical information
-- This is not a substitute for proper medical diagnosis
-## Model Information
-This application uses Google's **MedGemma-2B** model, specifically fine-tuned for medical applications. The model is optimized with:
-- 4-bit quantization for efficient inference
-- Automatic device mapping for optimal performance
-- Temperature-controlled generation for balanced responses
-## Technical Details
-- **Framework**: Gradio for the web interface
-- **Model**: google/medgemma-2b via Hugging Face Transformers
-- **Optimization**: BitsAndBytesConfig for memory efficiency
-- **Hardware**: GPU-accelerated inference when available
-## Local Development
-To run this locally:
 ```bash
-pip install -r requirements.txt
-python app.py
 ```
-## License
-This project is licensed under the Apache License 2.0.
-## Acknowledgments
-- Google for the MedGemma model
-- Hugging Face for the Transformers library
-- Gradio team for the interface framework

 license: apache-2.0
 ---
+# MedGemma Symptom Analyzer
+A modern medical AI application using Google's MedGemma model via HuggingFace Inference API for symptom analysis and medical consultation.
+## 🏥 Features
+- **AI-Powered Symptom Analysis**: Uses Google's MedGemma-4B model for medical insights
+- **Comprehensive Medical Reports**: Provides differential diagnoses, next steps, and red flags
+- **Interactive Web Interface**: Built with Gradio for easy use
+- **Demo Mode**: Fallback functionality when API is unavailable
+- **Medical Safety**: Includes appropriate disclaimers and safety guidance
+## 🚀 Quick Start
+### 1. Installation
+```bash
+# Clone the repository
+git clone <your-repo-url>
+cd medgemma-symptomps
+# Install dependencies
+pip install -r requirements.txt
+```
+### 2. HuggingFace Access Setup
+The app uses Google's MedGemma model, which requires special access:
+1. **Get HuggingFace Token**:
+   - Visit [HuggingFace Settings](https://huggingface.co/settings/tokens)
+   - Create a new token with `read` permissions
+2. **Request MedGemma Access**:
+   - Visit [google/medgemma-4b-it](https://huggingface.co/google/medgemma-4b-it)
+   - Click "Request access to this model"
+   - Wait for approval from Google (may take some time)
+3. **Set Environment Variable**:
+   ```bash
+   export HF_TOKEN="your_huggingface_token_here"
+   ```
+### 3. Run the Application
+```bash
+python3 app.py
+```
+The app will start on `http://localhost:7860` (or next available port).
+## 🔧 Configuration
+### Environment Variables
+- `HF_TOKEN`: Your HuggingFace API token (required for model access)
+- `FORCE_CPU`: Set to `true` to force CPU usage (not needed for API version)
+### Model Access Status
+The app handles different access scenarios:
+- ✅ **Full Access**: MedGemma model available via API
+- ⚠️ **Pending Access**: Waiting for model approval (uses demo mode)
+- ❌ **No Access**: Falls back to demo responses
+## 🧪 Testing
+Test the API connection:
 ```bash
+python3 test_api.py
 ```
+This will verify:
+- HuggingFace API connectivity
+- Token validity
+- Model access permissions
+## 📋 Usage
+### Web Interface
+1. Open the app in your browser
+2. Enter patient symptoms in the text area
+3. Adjust creativity slider if desired
+4. Click "Analyze Symptoms"
+5. Review the comprehensive medical analysis
+### Example Symptoms
+Try these example symptom descriptions:
+- **Flu-like**: "Fever, headache, body aches, and fatigue for 3 days"
+- **Chest pain**: "Sharp chest pain worsening with breathing, shortness of breath"
+- **Digestive**: "Abdominal pain, nausea, and diarrhea after eating"
+## 🔒 Medical Disclaimer
+**⚠️ IMPORTANT**: This tool is for educational purposes only. It should never replace professional medical advice, diagnosis, or treatment. Always consult qualified healthcare professionals for medical concerns.
+## 🏗️ Architecture
+### API-Based Design
+The app now uses HuggingFace Inference API instead of local model loading:
+- **Advantages**:
+  - No local GPU/CPU requirements
+  - Faster startup time
+  - Always up-to-date model
+  - Reduced memory usage
+- **Requirements**:
+  - Internet connection
+  - Valid HuggingFace token
+  - Model access approval
+### File Structure
+```
+medgemma-symptomps/
+├── app.py                    # Main Gradio application
+├── test_api.py              # API connection test script
+├── requirements.txt         # Python dependencies
+├── README.md               # This file
+└── medgemma_app.log        # Application logs
+```
+## 🛠️ Development
+### Key Components
+1. **MedGemmaSymptomAnalyzer**: Main class handling API connections
+2. **Gradio Interface**: Web UI with symptom input and analysis display
+3. **Demo Responses**: Fallback functionality for offline use
+### API Integration
+```python
+from huggingface_hub import InferenceClient
+client = InferenceClient(token=hf_token)
+response = client.text_generation(
+    prompt=medical_prompt,
+    model="google/medgemma-4b-it",
+    max_new_tokens=400,
+    temperature=0.7
+)
+```
+## 🔍 Troubleshooting
+### Common Issues
+1. **404 Model Not Found**:
+   - Ensure you have requested access to MedGemma
+   - Wait for Google's approval
+   - Verify your HuggingFace token is valid
+2. **Demo Mode Only**:
+   - Check your internet connection
+   - Verify HF_TOKEN environment variable
+   - Confirm model access approval status
+3. **Slow Responses**:
+   - API responses may take 10-30 seconds
+   - Consider adjusting max_tokens parameter
+### Getting Help
+- Check the application logs: `tail -f medgemma_app.log`
+- Test API connection: `python3 test_api.py`
+- Verify model access: Visit the HuggingFace model page
+## 📚 Resources
+- [MedGemma Model Card](https://huggingface.co/google/medgemma-4b-it)
+- [HuggingFace Inference API](https://huggingface.co/docs/api-inference/index)
+- [Gradio Documentation](https://gradio.app/docs/)
+## 📄 License
+This project uses the MedGemma model which has its own licensing terms. Please review the [model license](https://huggingface.co/google/medgemma-4b-it) before use.
+---
+**Remember**: Always prioritize patient safety and consult healthcare professionals for medical decisions.

app.py CHANGED Viewed

@@ -1,11 +1,9 @@
 import gradio as gr
-import torch
-from transformers import AutoProcessor, AutoModelForImageTextToText
-from PIL import Image
 import requests
 import re
 import logging
-import os
 # Configure logging
 logging.basicConfig(
@@ -20,221 +18,109 @@ logger = logging.getLogger(__name__)
 class MedGemmaSymptomAnalyzer:
     def __init__(self):
-        self.model = None
-        self.processor = None
-        self.model_loaded = False
-        logger.info("Initializing MedGemma Symptom Analyzer...")
-    def load_model(self):
-        """Load MedGemma model with optimizations for deployment and CPU compatibility"""
-        if self.model_loaded:
             return True
-        model_name = "google/medgemma-4b-it"
-        logger.info(f"Loading model: {model_name}")
-        # Check if CPU-only mode is forced via environment variable
-        force_cpu = os.getenv("FORCE_CPU", "false").lower() == "true"
-        # Detect available device and log system info
-        if force_cpu:
-            device = "cpu"
-            logger.info("Forcing CPU usage via FORCE_CPU environment variable")
-        else:
-            device = "cuda" if torch.cuda.is_available() else "cpu"
-            logger.info(f"Device detected: {device}")
-        if device == "cpu":
-            logger.info(f"CPU threads available: {torch.get_num_threads()}")
-        else:
-            logger.info(f"CUDA device: {torch.cuda.get_device_name()}")
         try:
-            # Get HF token from environment (set in Hugging Face Spaces secrets)
             hf_token = os.getenv("HF_TOKEN")
-            if hf_token:
-                logger.info("Using HF_TOKEN for authentication")
-            else:
-                logger.warning("HF_TOKEN not found in environment variables")
-            # Configure for multimodal model
-            if device == "cpu":
-                logger.info("Configuring for CPU-optimized loading...")
-                torch_dtype = torch.float32  # Use float32 for better CPU compatibility
-                device_map = "cpu"  # Explicit CPU device mapping
-                # Set optimal number of threads for CPU inference
-                torch.set_num_threads(4)  # Use 4 threads for better performance
-                loading_kwargs = {
-                    "torch_dtype": torch_dtype,
-                    "device_map": device_map,
-                    "low_cpu_mem_usage": True,  # Optimize memory usage on CPU
-                }
             else:
-                logger.info("Configuring for GPU loading...")
-                torch_dtype = torch.bfloat16
-                device_map = "auto"
-                loading_kwargs = {
-                    "torch_dtype": torch_dtype,
-                    "device_map": device_map,
-                }
-            logger.info("Loading processor...")
-            self.processor = AutoProcessor.from_pretrained(
-                model_name,
-                token=hf_token
-            )
-            logger.info(f"Loading model with dtype={torch_dtype}, device_map={device_map}...")
-            # Force garbage collection before loading
-            import gc
-            gc.collect()
-            self.model = AutoModelForImageTextToText.from_pretrained(
-                model_name,
-                token=hf_token,
-                trust_remote_code=False,  # Security best practice
-                **loading_kwargs
-            )
-            # Processor handles tokenization, no need to set pad token
-            # Ensure model is on correct device
-            if device == "cpu":
-                self.model = self.model.to('cpu')
-                logger.info("Model confirmed on CPU")
-                # Force garbage collection after loading
-                import gc
-                gc.collect()
-            self.model_loaded = True
-            logger.info(f"Model loaded successfully on {device}!")
             return True
-        except torch.cuda.OutOfMemoryError as e:
-            logger.error(f"GPU out of memory: {str(e)}")
-            logger.info("Attempting CPU fallback due to GPU memory constraints...")
-            try:
-                # Force CPU loading if GPU fails - use correct model class
-                self.model = AutoModelForImageTextToText.from_pretrained(
-                    model_name,
-                    token=hf_token,
-                    trust_remote_code=False,
-                    torch_dtype=torch.float32,
-                    device_map="cpu",
-                    low_cpu_mem_usage=True
-                )
-                self.model = self.model.to('cpu')
-                self.model_loaded = True
-                logger.info("Model loaded successfully on CPU after GPU failure!")
-                return True
-            except Exception as fallback_e:
-                logger.error(f"CPU fallback also failed: {str(fallback_e)}")
-                self.model = None
-                self.processor = None  # Fixed: was self.tokenizer
-                self.model_loaded = False
-                return False
-        except ImportError as e:
-            logger.error(f"Missing dependency for model loading: {str(e)}")
-            logger.info("Please ensure all required packages are installed: pip install -r requirements.txt")
-            self.model = None
-            self.processor = None
-            self.model_loaded = False
-            return False
-        except OSError as e:
-            if "disk quota exceeded" in str(e).lower() or "no space left" in str(e).lower():
-                logger.error("Insufficient disk space for model loading")
-                logger.info("Please free up disk space and try again")
-            elif "connection" in str(e).lower() or "timeout" in str(e).lower():
-                logger.error("Network connection issue during model download")
-                logger.info("Please check your internet connection and try again")
-            else:
-                logger.error(f"OS error during model loading: {str(e)}")
-            self.model = None
-            self.processor = None
-            self.model_loaded = False
-            return False
         except Exception as e:
-            logger.error(f"Failed to load model {model_name}: {str(e)}", exc_info=True)
-            logger.warning("Falling back to demo mode due to model loading failure")
-            # Provide helpful troubleshooting info
-            if device == "cpu":
-                logger.info("CPU loading troubleshooting tips:")
-                logger.info("- Ensure sufficient RAM (minimum 8GB recommended)")
-                logger.info("- Check that PyTorch CPU version is installed")
-                logger.info("- Verify HuggingFace token is valid")
-            self.model = None
-            self.processor = None
-            self.model_loaded = False
             return False
     def analyze_symptoms(self, symptoms_text, max_length=512, temperature=0.7):
-        """Analyze symptoms and provide medical insights"""
-        # Try to load model if not already loaded
-        if not self.model_loaded:
-            if not self.load_model():
-                # Fallback to demo response if model fails to load
                 return self._get_demo_response(symptoms_text)
-        if not self.model or not self.processor:
             return self._get_demo_response(symptoms_text)
-        # Format messages for chat template
-        messages = [
-            {
-                "role": "system",
-                "content": [{"type": "text", "text": "You are an expert medical AI assistant."}]
-            },
-            {
-                "role": "user",
-                "content": [{
-                    "type": "text",
-                    "text": f"""Patient presents with the following symptoms: {symptoms_text}
-Based on these symptoms, provide a medical analysis including:
-1. Possible differential diagnoses
-2. Recommended next steps
-3. When to seek immediate medical attention
 Medical Analysis:"""
-                }]
-            }
-        ]
         try:
-            # Apply chat template and tokenize
-            inputs = self.processor.apply_chat_template(
-                messages,
-                add_generation_prompt=True,
-                tokenize=True,
-                return_dict=True,
-                return_tensors="pt"
-            )
-            # Move inputs to model device
-            inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
-            input_len = inputs["input_ids"].shape[-1]
-            # Generate response
-            with torch.inference_mode():
-                generation = self.model.generate(
-                    **inputs,
-                    max_new_tokens=400,
-                    do_sample=True,
-                    temperature=temperature
-                )
-                generation = generation[0][input_len:]
-            # Decode response
-            generated_text = self.processor.decode(generation, skip_special_tokens=True)
-            return generated_text
         except Exception as e:
-            return f"Error during analysis: {str(e)}"
     def _get_demo_response(self, symptoms_text):
         """Provide a demo response when model is not available"""
@@ -242,7 +128,7 @@ Medical Analysis:"""
         # Simple keyword-based demo responses
         if any(word in symptoms_lower for word in ['fever', 'headache', 'fatigue', 'body aches']):
-            return """**DEMO MODE - Model not loaded**
 Based on the symptoms described (fever, headache, fatigue), here's a general analysis:
@@ -265,10 +151,10 @@ Based on the symptoms described (fever, headache, fatigue), here's a general ana
 - Persistent vomiting
 - Symptoms worsen rapidly
-*Note: This is a demo response. For actual medical analysis, the MedGemma model needs to be loaded.*"""
         elif any(word in symptoms_lower for word in ['chest pain', 'breathing', 'shortness']):
-            return """**DEMO MODE - Model not loaded**
 Based on chest-related symptoms, here's a general analysis:
@@ -291,10 +177,10 @@ Based on chest-related symptoms, here's a general analysis:
 - Dizziness or fainting
 - These symptoms require immediate medical care
-*Note: This is a demo response. For actual medical analysis, the MedGemma model needs to be loaded.*"""
         else:
-            return f"""**DEMO MODE - Model not loaded**
 Thank you for describing your symptoms. In demo mode, I can provide general guidance:
@@ -310,9 +196,9 @@ Thank you for describing your symptoms. In demo mode, I can provide general guid
 - You have underlying health conditions
 - You're unsure about the severity
-For a proper AI-powered analysis of your specific symptoms: "{symptoms_text[:100]}...", the MedGemma model would need to be successfully loaded.
-*Note: This is a demo response. For actual medical analysis, the MedGemma model needs to be loaded.*"""
 # Initialize the analyzer
 analyzer = MedGemmaSymptomAnalyzer()

 import gradio as gr
+import os
 import requests
 import re
 import logging
+from huggingface_hub import InferenceClient
 # Configure logging
 logging.basicConfig(
 class MedGemmaSymptomAnalyzer:
     def __init__(self):
+        self.client = None
+        self.model_name = "google/medgemma-4b-it"
+        self.api_connected = False
+        logger.info("Initializing MedGemma Symptom Analyzer with HuggingFace Inference API...")
+    def connect_to_api(self):
+        """Connect to HuggingFace Inference API"""
+        if self.api_connected:
             return True
+        logger.info("Connecting to HuggingFace Inference API...")
         try:
+            # Get HF token from environment or use provided token
             hf_token = os.getenv("HF_TOKEN")
+            if hf_token:
+                logger.info("Using HuggingFace token for API authentication")
             else:
+                logger.warning("No HuggingFace token found")
+                return False
+            # Initialize the InferenceClient
+            self.client = InferenceClient(token=hf_token)
+            self.api_connected = True
+            logger.info("✅ Connected to HuggingFace Inference API successfully!")
             return True
         except Exception as e:
+            logger.error(f"Failed to connect to HuggingFace API: {str(e)}")
+            logger.warning("Falling back to demo mode due to API connection failure")
+            self.client = None
+            self.api_connected = False
             return False
     def analyze_symptoms(self, symptoms_text, max_length=512, temperature=0.7):
+        """Analyze symptoms using HuggingFace Inference API"""
+        # Try to connect to API if not already connected
+        if not self.api_connected:
+            if not self.connect_to_api():
+                # Fallback to demo response if API connection fails
                 return self._get_demo_response(symptoms_text)
+        if not self.client:
             return self._get_demo_response(symptoms_text)
+        # Format prompt for text generation
+        prompt = f"""You are an expert medical AI assistant trained to analyze symptoms and provide comprehensive medical insights.
+Patient presents with the following symptoms: {symptoms_text}
+Based on these symptoms, provide a comprehensive medical analysis including:
+1. **Possible Differential Diagnoses**: List the most likely conditions based on the symptoms
+2. **Recommended Next Steps**: Suggest appropriate diagnostic tests or evaluations
+3. **When to Seek Immediate Medical Attention**: Identify red flags requiring urgent care
+4. **General Care Recommendations**: Provide supportive care and lifestyle advice
 Medical Analysis:"""
         try:
+            logger.info("Sending request to HuggingFace Inference API...")
+            # Make API call using text generation
+            response_text = self.client.text_generation(
+                prompt=prompt,
+                model=self.model_name,
+                max_new_tokens=max_length,
+                temperature=temperature,
+                return_full_text=False
+            )
+            # Check if we got a response
+            if response_text:
+                logger.info("✅ Successfully received response from API")
+                return response_text
+            else:
+                logger.warning("No response received from API")
+                return self._get_demo_response(symptoms_text)
         except Exception as e:
+            error_msg = str(e)
+            logger.error(f"Error during API analysis: {error_msg}")
+            # Provide specific error messages for common issues
+            if "404" in error_msg and "medgemma" in error_msg.lower():
+                logger.warning("MedGemma model may require special access approval (gated model)")
+                return f"""**API ACCESS REQUIRED**
+The MedGemma model appears to require special access approval from Google/HuggingFace.
+To use the actual MedGemma model:
+1. Visit: https://huggingface.co/google/medgemma-4b-it
+2. Request access to the gated model
+3. Wait for approval from Google
+4. Ensure your HuggingFace token has the necessary permissions
+**Current Status**: Using demo mode while waiting for model access.
+{self._get_demo_response(symptoms_text)}"""
+            else:
+                # Fallback to demo response on API error
+                return self._get_demo_response(symptoms_text)
     def _get_demo_response(self, symptoms_text):
         """Provide a demo response when model is not available"""
         # Simple keyword-based demo responses
         if any(word in symptoms_lower for word in ['fever', 'headache', 'fatigue', 'body aches']):
+            return """**DEMO MODE - API not connected**
 Based on the symptoms described (fever, headache, fatigue), here's a general analysis:
 - Persistent vomiting
 - Symptoms worsen rapidly
+*Note: This is a demo response. For actual medical analysis, the HuggingFace Inference API needs to be connected.*"""
         elif any(word in symptoms_lower for word in ['chest pain', 'breathing', 'shortness']):
+            return """**DEMO MODE - API not connected**
 Based on chest-related symptoms, here's a general analysis:
 - Dizziness or fainting
 - These symptoms require immediate medical care
+*Note: This is a demo response. For actual medical analysis, the HuggingFace Inference API needs to be connected.*"""
         else:
+            return f"""**DEMO MODE - API not connected**
 Thank you for describing your symptoms. In demo mode, I can provide general guidance:
 - You have underlying health conditions
 - You're unsure about the severity
+For a proper AI-powered analysis of your specific symptoms: "{symptoms_text[:100]}...", the HuggingFace Inference API would need to be successfully connected.
+*Note: This is a demo response. For actual medical analysis, the HuggingFace Inference API needs to be connected.*"""
 # Initialize the analyzer
 analyzer = MedGemmaSymptomAnalyzer()