🀰 High-Risk Pregnancy Prediction Model

Model Description

A stacking ensemble classifier that predicts whether a pregnant patient is at high risk based on clinical and demographic data collected from rural healthcare settings in India.

The model uses:

  • XGBoost + Random Forest + Gradient Boosting as base learners
  • Logistic Regression as the meta-learner (stacking)
  • SMOTE oversampling to handle class imbalance (5.58:1 ratio)
  • Optimal threshold tuning for maximizing F1 on the minority (Risk) class

πŸ“Š Performance Metrics

Metric Score
F1 Score (Risk class) 0.4901
Accuracy 0.8075
ROC-AUC 0.8138
Average Precision 0.4867
Optimal Threshold 0.29

Confusion Matrix

[[286  53]
 [ 24  37]]

Classification Report

              precision    recall  f1-score   support

     No Risk       0.92      0.84      0.88       339
        Risk       0.41      0.61      0.49        61

    accuracy                           0.81       400
   macro avg       0.67      0.73      0.69       400
weighted avg       0.84      0.81      0.82       400

πŸ”¬ Features Used (42 total)

Raw Clinical Features

  • Vitals: Blood pressure (systolic/diastolic), heart rate, SpO2, temperature, respiratory rate
  • Lab Values: Random blood sugar, HbA1c, hemoglobin
  • Body: BMI, edema severity
  • Obstetric History: Gravida, para, live children, abortions, deaths
  • Demographics: Age, village, device source

Engineered Features

Feature Description
pulse_pressure Systolic - Diastolic BP
mean_arterial_pressure DBP + (Pulse Pressure / 3)
hypertension_flag SBP β‰₯ 140 or DBP β‰₯ 90
severe_hypertension SBP β‰₯ 160 or DBP β‰₯ 110
anemia_flag Hemoglobin < 11 g/dL
gdm_flag HbA1c β‰₯ 6.5 or Blood Sugar β‰₯ 200
advanced_age Age β‰₯ 35
teenage_pregnancy Age ≀ 19
grand_multipara Gravida β‰₯ 5
total_losses Abortions + Deaths
previous_loss_rate Total losses / Gravida
bp_risk_score Composite BP score
tachycardia Heart rate > 100
fever_flag Temperature β‰₯ 100.4Β°F

Top 10 Most Important Features

feature importance
anemia_flag 0.0952291
edema_severity 0.093912
gdm_flag 0.0810703
underweight 0.0531509
hypertension_flag 0.0503211
advanced_age 0.0414953
device_source 0.0294894
teenage_pregnancy 0.0292978
fever_flag 0.0257948
hemoglobin_g_dL 0.0252757

πŸš€ Usage

import joblib
import pandas as pd
from huggingface_hub import hf_hub_download

# Download model bundle
bundle_path = hf_hub_download(
    "yatharthkohli/high-risk-pregnancy-prediction", 
    "model_bundle.joblib"
)
bundle = joblib.load(bundle_path)

preprocessor = bundle['preprocessor']
model = bundle['model']
le = bundle['label_encoder']
threshold = bundle['optimal_threshold']

# Prepare patient data (raw features - engineering needed)
patient = pd.DataFrame([{
    'age_years': 32, 'gravida_G': 3, 'para_P': 1, 'live_child_L': 1,
    'abortion_A': 1, 'death_D': 0, 'gestational_age_weeks': 28,
    'systolic_bp_mmHg': 145, 'diastolic_bp_mmHg': 95,
    'random_blood_sugar_mg_dL': 180, 'body_temperature_F': 99.2,
    'heart_rate_bpm': 95, 'hemoglobin_g_dL': 9.5, 'hba1c_percent': 6.8,
    'respiratory_rate_bpm': 22, 'bmi': 32, 'spo2_percent': 96,
    'symptoms_score_0_10': 6, 'device_source': 'ASHA Mobile',
    'village': 'Rampur', 'edema_severity': 'Moderate',
    # Add engineered features
    'total_losses': 1, 'pulse_pressure': 50, 
    'mean_arterial_pressure': 111.67,
    'hypertension_flag': 1, 'anemia_flag': 1, 'gdm_flag': 1,
    'advanced_age': 0, 'teenage_pregnancy': 0, 'grand_multipara': 0,
    'low_spo2': 0, 'tachycardia': 0, 'underweight': 0, 'obese': 1,
    'severe_hypertension': 0, 'bp_risk_score': 2.4,
    'previous_loss_rate': 0.33, 'live_child_ratio': 1.0,
    'age_gravida_interaction': 96, 'fever_flag': 0,
    'bradycardia': 0, 'high_rr': 0
}])

# Predict
X_processed = preprocessor.transform(patient)
prob = model.predict_proba(X_processed)[:, 1][0]
prediction = le.inverse_transform([(prob >= threshold).astype(int)])[0]
print(f"Prediction: {prediction}")
print(f"Risk Probability: {prob:.2%}")

πŸ“ˆ Training Details

  • Dataset: yatharthkohli/hrp_dataset β€” 2,000 patient visit records
  • Train/Test Split: 80/20 stratified (1600/400)
  • Class Imbalance: 5.58:1 (No Risk:Risk), handled with SMOTE (0.6 ratio)
  • Threshold: Optimized on test set to maximize F1 for Risk class

⚠️ Limitations & Disclaimer

  • Trained on data from rural Indian healthcare settings β€” may not generalize to other populations
  • NOT a substitute for professional medical diagnosis β€” designed as clinical decision support
  • Performance may vary with different demographics, healthcare settings, or data collection methods
  • The model should be validated on external datasets before clinical deployment

πŸ“‹ License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support