Yusuf Musa
Machine Learning

Customer Churn Analysis

50KCustomers Analyzed
0.929ROC-AUC Score
$19.7MAt-Risk Revenue
$3.95MPotential Savings
By Yusuf Musa

1. Executive Summary

This report presents a data-driven approach to reducing customer churn for an e-commerce business experiencing high attrition and declining quarterly revenue. The analysis covers 50,000 customers across 25 features spanning demographics, platform engagement, purchase behavior, customer service interactions, and financials.

An XGBoost classifier was selected as the best-performing churn prediction model after evaluating four candidates. It achieves a ROC-AUC of 0.929 and correctly identifies 85% of customers who go on to churn. The total at-risk revenue across all segments is $19.7M. A conservative estimate of 20% retention improvement from targeted campaigns yields $3.95M in potential savings, with 85% of that value concentrated in the high-risk tier.

End-to-End Pipeline

1

EDA

50K customers 25 features

2

Engineer

Cart Friction, Engagement Score

3

Model

4 classifiers tuned & compared

4

Segment

SHAP clustering 3 archetypes

5

Strategy

Targeted campaigns $3.95M savings

2. Key Findings

Top Churn Drivers

Feature importance analysis using four methods (Mutual Information, Random Forest, Permutation Importance, and SHAP) revealed the following top drivers:

Feature Importance (Avg Rank Across 4 Methods)

Customer Service Calls
More calls = more churn
1.5
Lifetime Value
Non-linear relationship
2.8
Cart Abandonment Rate
Higher = more churn
3.2
Cart Friction (eng.)
Higher = more churn
5.8
Engagement Score (eng.)
Lower = more churn
6.5

A notable finding: Lifetime Value showed near-zero linear correlation with churn (-0.011), yet ranked #2 overall in tree-based importance methods. This reveals a strong non-linear relationship that traditional correlation analysis completely missed.

Key Insight:LTV's near-zero linear correlation (-0.011) masked a strong non-linear churn relationship that only tree-based methods detected. This underscores why ensemble models outperform logistic regression here.

3. Model Performance

Four models were trained with class-weight balancing to handle the 71/29 class imbalance, then tuned via RandomizedSearchCV (50 iterations, 5-fold stratified cross-validation, scored on ROC-AUC):

Model ROC-AUC Comparison

XGBoost *
0.929
LightGBM
0.929
Random Forest
0.926
Logistic Reg.
0.814

* Selected model — best precision/recall balance

ModelAccuracyPrecisionRecallF1ROC-AUC
XGBoost0.9140.8520.8510.8510.929
LightGBM0.9120.8470.8490.8480.929
Random Forest0.9150.8860.8110.8470.926
Logistic Regression0.7480.5480.7350.6280.814

4. Customer Segments

Risk Tier Overview

Customer Risk Distribution

65.6%

32,818 customers

Low Risk

3.7% churn

7.7%

Med

26.3%

26.7%

13,337 customers

High Risk

91.8% churn

High-Risk Archetypes

SHAP values were computed for all high-risk customers and clustered using K-Means (k=3, selected by silhouette score). Three distinct archetypes emerged:

High-Value Frustrated

4,063 customersChurn: 100%Avg LTV: $2,575

Service Calls, Cart Abandon., LTV

Cart Abandoners

7,387 customersChurn: 85.1%Avg LTV: $715

Cart Abandon., Cart Friction, LTV

Price Sensitive

1,887 customersChurn: 100%Avg LTV: $1,387

Age, Discount Usage, LTV

5. Expected Business Impact

Revenue at Risk & Potential Savings

High-Value Frustrated$10.5M at risk → $2.09M savings
53%
Cart Abandoners$4.5M at risk → $900K savings
23%
Price Sensitive$2.6M at risk → $523K savings
13%
Medium Risk$1.1M at risk → $215K savings
5%
Low Risk$1.8M at risk → $362K savings
9%
85% of savings ($3.5M) concentrated in high-risk tier

6. Recommended Strategies

P1

High-Value Service Frustrated

4,063 customers

At-risk: $10.5M

Savings: $2.09M

  • Assign dedicated account managers to top-LTV customers
  • Proactive outreach acknowledging past service issues
  • Priority ticket routing with 4-hour response SLA
  • Account credit and free shipping as service recovery
P2

Cart Abandoners

7,387 customers

At-risk: $4.5M

Savings: $900K

  • Automated cart abandonment email within 1 hour of drop-off
  • Free shipping threshold set above typical cart value
  • Checkout UX simplification and guest checkout option
  • Push notification with 10% incentive on abandoned items
P3

Price Sensitive

1,887 customers

At-risk: $2.6M

Savings: $523K

  • Tiered loyalty discounts (spend more, save more)
  • Bundle offers to increase average order value
  • Age-appropriate product recommendations and channels