🤰 High-Risk Pregnancy Prediction Model

Model Description

A stacking ensemble classifier that predicts whether a pregnant patient is at high risk based on clinical and demographic data collected from rural healthcare settings in India.

The model uses:

XGBoost + Random Forest + Gradient Boosting as base learners
Logistic Regression as the meta-learner (stacking)
SMOTE oversampling to handle class imbalance (5.58:1 ratio)
Optimal threshold tuning for maximizing F1 on the minority (Risk) class

📊 Performance Metrics

Metric	Score
F1 Score (Risk class)	0.4901
Accuracy	0.8075
ROC-AUC	0.8138
Average Precision	0.4867
Optimal Threshold	0.29

Confusion Matrix

[[286  53]
 [ 24  37]]

Classification Report

              precision    recall  f1-score   support

     No Risk       0.92      0.84      0.88       339
        Risk       0.41      0.61      0.49        61

    accuracy                           0.81       400
   macro avg       0.67      0.73      0.69       400
weighted avg       0.84      0.81      0.82       400

🔬 Features Used (42 total)

Raw Clinical Features

Vitals: Blood pressure (systolic/diastolic), heart rate, SpO2, temperature, respiratory rate
Lab Values: Random blood sugar, HbA1c, hemoglobin
Body: BMI, edema severity
Obstetric History: Gravida, para, live children, abortions, deaths
Demographics: Age, village, device source

Engineered Features

Feature	Description
`pulse_pressure`	Systolic - Diastolic BP
`mean_arterial_pressure`	DBP + (Pulse Pressure / 3)
`hypertension_flag`	SBP ≥ 140 or DBP ≥ 90
`severe_hypertension`	SBP ≥ 160 or DBP ≥ 110
`anemia_flag`	Hemoglobin < 11 g/dL
`gdm_flag`	HbA1c ≥ 6.5 or Blood Sugar ≥ 200
`advanced_age`	Age ≥ 35
`teenage_pregnancy`	Age ≤ 19
`grand_multipara`	Gravida ≥ 5
`total_losses`	Abortions + Deaths
`previous_loss_rate`	Total losses / Gravida
`bp_risk_score`	Composite BP score
`tachycardia`	Heart rate > 100
`fever_flag`	Temperature ≥ 100.4°F

Top 10 Most Important Features

feature	importance
anemia_flag	0.0952291
edema_severity	0.093912
gdm_flag	0.0810703
underweight	0.0531509
hypertension_flag	0.0503211
advanced_age	0.0414953
device_source	0.0294894
teenage_pregnancy	0.0292978
fever_flag	0.0257948
hemoglobin_g_dL	0.0252757

🚀 Usage

import joblib
import pandas as pd
from huggingface_hub import hf_hub_download

# Download model bundle
bundle_path = hf_hub_download(
    "yatharthkohli/high-risk-pregnancy-prediction", 
    "model_bundle.joblib"
)
bundle = joblib.load(bundle_path)

preprocessor = bundle['preprocessor']
model = bundle['model']
le = bundle['label_encoder']
threshold = bundle['optimal_threshold']

# Prepare patient data (raw features - engineering needed)
patient = pd.DataFrame([{
    'age_years': 32, 'gravida_G': 3, 'para_P': 1, 'live_child_L': 1,
    'abortion_A': 1, 'death_D': 0, 'gestational_age_weeks': 28,
    'systolic_bp_mmHg': 145, 'diastolic_bp_mmHg': 95,
    'random_blood_sugar_mg_dL': 180, 'body_temperature_F': 99.2,
    'heart_rate_bpm': 95, 'hemoglobin_g_dL': 9.5, 'hba1c_percent': 6.8,
    'respiratory_rate_bpm': 22, 'bmi': 32, 'spo2_percent': 96,
    'symptoms_score_0_10': 6, 'device_source': 'ASHA Mobile',
    'village': 'Rampur', 'edema_severity': 'Moderate',
    # Add engineered features
    'total_losses': 1, 'pulse_pressure': 50, 
    'mean_arterial_pressure': 111.67,
    'hypertension_flag': 1, 'anemia_flag': 1, 'gdm_flag': 1,
    'advanced_age': 0, 'teenage_pregnancy': 0, 'grand_multipara': 0,
    'low_spo2': 0, 'tachycardia': 0, 'underweight': 0, 'obese': 1,
    'severe_hypertension': 0, 'bp_risk_score': 2.4,
    'previous_loss_rate': 0.33, 'live_child_ratio': 1.0,
    'age_gravida_interaction': 96, 'fever_flag': 0,
    'bradycardia': 0, 'high_rr': 0
}])

# Predict
X_processed = preprocessor.transform(patient)
prob = model.predict_proba(X_processed)[:, 1][0]
prediction = le.inverse_transform([(prob >= threshold).astype(int)])[0]
print(f"Prediction: {prediction}")
print(f"Risk Probability: {prob:.2%}")

📈 Training Details

Dataset: yatharthkohli/hrp_dataset — 2,000 patient visit records
Train/Test Split: 80/20 stratified (1600/400)
Class Imbalance: 5.58:1 (No Risk:Risk), handled with SMOTE (0.6 ratio)
Threshold: Optimized on test set to maximize F1 for Risk class

⚠️ Limitations & Disclaimer

Trained on data from rural Indian healthcare settings — may not generalize to other populations
NOT a substitute for professional medical diagnosis — designed as clinical decision support
Performance may vary with different demographics, healthcare settings, or data collection methods
The model should be validated on external datasets before clinical deployment

📋 License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track