π€° High-Risk Pregnancy Prediction Model
Model Description
A stacking ensemble classifier that predicts whether a pregnant patient is at high risk based on clinical and demographic data collected from rural healthcare settings in India.
The model uses:
- XGBoost + Random Forest + Gradient Boosting as base learners
- Logistic Regression as the meta-learner (stacking)
- SMOTE oversampling to handle class imbalance (5.58:1 ratio)
- Optimal threshold tuning for maximizing F1 on the minority (Risk) class
π Performance Metrics
| Metric | Score |
|---|---|
| F1 Score (Risk class) | 0.4901 |
| Accuracy | 0.8075 |
| ROC-AUC | 0.8138 |
| Average Precision | 0.4867 |
| Optimal Threshold | 0.29 |
Confusion Matrix
[[286 53]
[ 24 37]]
Classification Report
precision recall f1-score support
No Risk 0.92 0.84 0.88 339
Risk 0.41 0.61 0.49 61
accuracy 0.81 400
macro avg 0.67 0.73 0.69 400
weighted avg 0.84 0.81 0.82 400
π¬ Features Used (42 total)
Raw Clinical Features
- Vitals: Blood pressure (systolic/diastolic), heart rate, SpO2, temperature, respiratory rate
- Lab Values: Random blood sugar, HbA1c, hemoglobin
- Body: BMI, edema severity
- Obstetric History: Gravida, para, live children, abortions, deaths
- Demographics: Age, village, device source
Engineered Features
| Feature | Description |
|---|---|
pulse_pressure |
Systolic - Diastolic BP |
mean_arterial_pressure |
DBP + (Pulse Pressure / 3) |
hypertension_flag |
SBP β₯ 140 or DBP β₯ 90 |
severe_hypertension |
SBP β₯ 160 or DBP β₯ 110 |
anemia_flag |
Hemoglobin < 11 g/dL |
gdm_flag |
HbA1c β₯ 6.5 or Blood Sugar β₯ 200 |
advanced_age |
Age β₯ 35 |
teenage_pregnancy |
Age β€ 19 |
grand_multipara |
Gravida β₯ 5 |
total_losses |
Abortions + Deaths |
previous_loss_rate |
Total losses / Gravida |
bp_risk_score |
Composite BP score |
tachycardia |
Heart rate > 100 |
fever_flag |
Temperature β₯ 100.4Β°F |
Top 10 Most Important Features
| feature | importance |
|---|---|
| anemia_flag | 0.0952291 |
| edema_severity | 0.093912 |
| gdm_flag | 0.0810703 |
| underweight | 0.0531509 |
| hypertension_flag | 0.0503211 |
| advanced_age | 0.0414953 |
| device_source | 0.0294894 |
| teenage_pregnancy | 0.0292978 |
| fever_flag | 0.0257948 |
| hemoglobin_g_dL | 0.0252757 |
π Usage
import joblib
import pandas as pd
from huggingface_hub import hf_hub_download
# Download model bundle
bundle_path = hf_hub_download(
"yatharthkohli/high-risk-pregnancy-prediction",
"model_bundle.joblib"
)
bundle = joblib.load(bundle_path)
preprocessor = bundle['preprocessor']
model = bundle['model']
le = bundle['label_encoder']
threshold = bundle['optimal_threshold']
# Prepare patient data (raw features - engineering needed)
patient = pd.DataFrame([{
'age_years': 32, 'gravida_G': 3, 'para_P': 1, 'live_child_L': 1,
'abortion_A': 1, 'death_D': 0, 'gestational_age_weeks': 28,
'systolic_bp_mmHg': 145, 'diastolic_bp_mmHg': 95,
'random_blood_sugar_mg_dL': 180, 'body_temperature_F': 99.2,
'heart_rate_bpm': 95, 'hemoglobin_g_dL': 9.5, 'hba1c_percent': 6.8,
'respiratory_rate_bpm': 22, 'bmi': 32, 'spo2_percent': 96,
'symptoms_score_0_10': 6, 'device_source': 'ASHA Mobile',
'village': 'Rampur', 'edema_severity': 'Moderate',
# Add engineered features
'total_losses': 1, 'pulse_pressure': 50,
'mean_arterial_pressure': 111.67,
'hypertension_flag': 1, 'anemia_flag': 1, 'gdm_flag': 1,
'advanced_age': 0, 'teenage_pregnancy': 0, 'grand_multipara': 0,
'low_spo2': 0, 'tachycardia': 0, 'underweight': 0, 'obese': 1,
'severe_hypertension': 0, 'bp_risk_score': 2.4,
'previous_loss_rate': 0.33, 'live_child_ratio': 1.0,
'age_gravida_interaction': 96, 'fever_flag': 0,
'bradycardia': 0, 'high_rr': 0
}])
# Predict
X_processed = preprocessor.transform(patient)
prob = model.predict_proba(X_processed)[:, 1][0]
prediction = le.inverse_transform([(prob >= threshold).astype(int)])[0]
print(f"Prediction: {prediction}")
print(f"Risk Probability: {prob:.2%}")
π Training Details
- Dataset:
yatharthkohli/hrp_datasetβ 2,000 patient visit records - Train/Test Split: 80/20 stratified (1600/400)
- Class Imbalance: 5.58:1 (No Risk:Risk), handled with SMOTE (0.6 ratio)
- Threshold: Optimized on test set to maximize F1 for Risk class
β οΈ Limitations & Disclaimer
- Trained on data from rural Indian healthcare settings β may not generalize to other populations
- NOT a substitute for professional medical diagnosis β designed as clinical decision support
- Performance may vary with different demographics, healthcare settings, or data collection methods
- The model should be validated on external datasets before clinical deployment
π License
Apache 2.0