Customer Churn Analysis

1. Executive Summary
This report presents a data-driven approach to reducing customer churn for an e-commerce business experiencing high attrition and declining quarterly revenue. The analysis covers 50,000 customers across 25 features spanning demographics, platform engagement, purchase behavior, customer service interactions, and financials.
An XGBoost classifier was selected as the best-performing churn prediction model after evaluating four candidates. It achieves a ROC-AUC of 0.929 and correctly identifies 85% of customers who go on to churn. The total at-risk revenue across all segments is $19.7M. A conservative estimate of 20% retention improvement from targeted campaigns yields $3.95M in potential savings, with 85% of that value concentrated in the high-risk tier.
End-to-End Pipeline
EDA
50K customers 25 features
Engineer
Cart Friction, Engagement Score
Model
4 classifiers tuned & compared
Segment
SHAP clustering 3 archetypes
Strategy
Targeted campaigns $3.95M savings
2. Key Findings
Top Churn Drivers
Feature importance analysis using four methods (Mutual Information, Random Forest, Permutation Importance, and SHAP) revealed the following top drivers:
Feature Importance (Avg Rank Across 4 Methods)
A notable finding: Lifetime Value showed near-zero linear correlation with churn (-0.011), yet ranked #2 overall in tree-based importance methods. This reveals a strong non-linear relationship that traditional correlation analysis completely missed.
Key Insight:LTV's near-zero linear correlation (-0.011) masked a strong non-linear churn relationship that only tree-based methods detected. This underscores why ensemble models outperform logistic regression here.
3. Model Performance
Four models were trained with class-weight balancing to handle the 71/29 class imbalance, then tuned via RandomizedSearchCV (50 iterations, 5-fold stratified cross-validation, scored on ROC-AUC):
Model ROC-AUC Comparison
* Selected model — best precision/recall balance
| Model | Accuracy | Precision | Recall | F1 | ROC-AUC |
|---|---|---|---|---|---|
| XGBoost | 0.914 | 0.852 | 0.851 | 0.851 | 0.929 |
| LightGBM | 0.912 | 0.847 | 0.849 | 0.848 | 0.929 |
| Random Forest | 0.915 | 0.886 | 0.811 | 0.847 | 0.926 |
| Logistic Regression | 0.748 | 0.548 | 0.735 | 0.628 | 0.814 |
4. Customer Segments
Risk Tier Overview
Customer Risk Distribution
65.6%
32,818 customers
Low Risk
3.7% churn
7.7%
Med
26.3%
26.7%
13,337 customers
High Risk
91.8% churn
High-Risk Archetypes
SHAP values were computed for all high-risk customers and clustered using K-Means (k=3, selected by silhouette score). Three distinct archetypes emerged:
High-Value Frustrated
Service Calls, Cart Abandon., LTV
Cart Abandoners
Cart Abandon., Cart Friction, LTV
Price Sensitive
Age, Discount Usage, LTV
5. Expected Business Impact
Revenue at Risk & Potential Savings
6. Recommended Strategies
High-Value Service Frustrated
4,063 customers
At-risk: $10.5M
Savings: $2.09M
- Assign dedicated account managers to top-LTV customers
- Proactive outreach acknowledging past service issues
- Priority ticket routing with 4-hour response SLA
- Account credit and free shipping as service recovery
Cart Abandoners
7,387 customers
At-risk: $4.5M
Savings: $900K
- Automated cart abandonment email within 1 hour of drop-off
- Free shipping threshold set above typical cart value
- Checkout UX simplification and guest checkout option
- Push notification with 10% incentive on abandoned items
Price Sensitive
1,887 customers
At-risk: $2.6M
Savings: $523K
- Tiered loyalty discounts (spend more, save more)
- Bundle offers to increase average order value
- Age-appropriate product recommendations and channels
