Model Card for Customer Churn Prediction Pipeline
This model is a trained Scikit-learn pipeline designed to predict whether a telecom customer is likely to churn based on account, service, and billing attributes.
Model Details
Model Description
This model acts as a churn-risk scoring engine for retention workflows. It combines preprocessing (imputation, scaling, one-hot encoding) and classification in a single serialized pipeline artifact for consistent training and inference behavior.
- Developed by: Aashir Hameed
- Model type: Scikit-learn Tabular Classification Pipeline
- Language(s): English (
en) for feature labels/documentation - License: Apache 2.0
- Trained from: Telco customer churn tabular dataset
Model Sources
- Repository: GitHub: aashir92/Customer_Churn_Prediction
- Model: Hugging Face: Aashir92/Customer-Churn-Prediction
- Demo: Hugging Face Spaces Live UI
Uses
Direct Use
This model is intended for churn risk scoring in:
- CRM prioritization and retention campaigns
- Proactive outreach workflows for high-risk customers
- Batch scoring of customer cohorts
Binary output mapping:
0: No Churn1: Churn
Out-of-Scope Use
This model is not intended for:
- Causal inference on churn drivers
- Fairness-critical automated decisions without human review
- Data distributions that significantly differ from the Telco training data
Bias, Risks, and Limitations
Like all supervised models, this pipeline may reflect historical biases and collection artifacts present in source data. Prediction confidence can degrade under distribution shift (for example new plans, pricing structures, or service bundles not represented in training data). The model should be monitored for drift and recalibrated/retrained on a schedule.
How to Get Started with the Model
Use the code below for inference with joblib:
from pathlib import Path
import joblib
import pandas as pd
model = joblib.load(Path("churn_model_v1.pkl"))
sample = pd.DataFrame(
[
{
"gender": "Female",
"SeniorCitizen": "0",
"Partner": "Yes",
"Dependents": "No",
"tenure": 12,
"PhoneService": "Yes",
"MultipleLines": "No",
"InternetService": "Fiber optic",
"OnlineSecurity": "No",
"OnlineBackup": "Yes",
"DeviceProtection": "No",
"TechSupport": "No",
"StreamingTV": "Yes",
"StreamingMovies": "Yes",
"Contract": "Month-to-month",
"PaperlessBilling": "Yes",
"PaymentMethod": "Electronic check",
"MonthlyCharges": 89.1,
"TotalCharges": 1069.2,
}
]
)
prediction = model.predict(sample)[0]
probability = model.predict_proba(sample)[0][1]
print(prediction, probability)
Training Details
Training Data
The model was trained on WA_Fn-UseC_-Telco-Customer-Churn.csv with the standard churn target column (Churn).
Training Procedure
Preprocessing
- Dropped non-predictive
customerID - Coerced
TotalChargesto numeric and removed rows with invalid target/critical numeric values - Numeric preprocessing: median imputation + standard scaling
- Categorical preprocessing: most-frequent imputation + one-hot encoding (
handle_unknown='ignore')
Training Hyperparameters
- Validation: Stratified K-Fold cross-validation (
n_splits=5) - Model search:
GridSearchCVwith scoring =f1 - Candidates: Logistic Regression and Random Forest
- Winning model: Random Forest
- Best params (winner):
class_weight=balancedmax_depth=8min_samples_leaf=4min_samples_split=2n_estimators=200
Evaluation
Testing Data, Factors & Metrics
Testing Data
Held-out split from the Telco dataset with stratified train/test partitioning.
Metrics
- Accuracy
- F1-score
Results
- Final Test Accuracy: 75.05%
- Final Test F1-Score: 62.38%
- Best CV F1-score: 63.96%
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator.
- Hardware Type: Standard local CPU training environment
- Training profile: Classical ML grid-search over two model families
Author & Contact
Aashir Hameed
- ๐ Website: aashir92.github.io
- ๐ผ LinkedIn: Aashir Hameed
- ๐ GitHub: @aashir92