Customer Churn Prediction
Customer Churn
As of September 2021, a rough estimate for CAC could range from $50 to $300 or more per customer (personal communications, Chat GPT August 2023). A general guideline for a good attrition rate for subscription-based online businesses was typically around 5% per month. However, it's essential to note that what constitutes a "good" attrition rate can differ significantly between industries, business model, engagement activities, competitive actions, pricing, value propositiona and service quality. Predicting and reducing customer churn or attrition is a interesting challenge. Considerations around causality related Issues in predicting customer churn is key. Some of them are as follows:
- Correlation vs. Causation: Distinguishing between correlated factors and actual causes of churn.
- Confounding Variables: Identifying and accounting for variables that may distort causal relationships.
- Time Lag: Dealing with time delays between causes and the manifestation of churn.
- Reverse Causality: Recognizing situations where churn itself influences the identified causes.
- Hidden Causes: Discovering latent or unobservable factors contributing to churn.
- Feedback Loops: Handling situations where churn and its causes create feedback loops.
- Interventions: Understanding how interventions to reduce churn may impact causal relationships.
- Data Quality: Ensuring that data used to establish causal links is accurate and comprehensive.
At a aggregate level churn/ attrition % is a end of period fact. Customer churn rate is the percentage of customers who have gone from active to inactive during the analysis period. The denominator for the churn rate is the number of active customers at the beginning of the period, and the numerator is churn.
A predictive model establishes causality between seemingly disparate events e.g. low balance in wallet, app errors while loading moneies during the obersation period to the probability that a customer may never return due to degraded experience. A predictive model trained on historical data of casauation predicts probability of churn in prediction period. Assuming causation isn't broken a churn model can be valuable asset to design interventions to improve retention
Introducing the problem
The problem at hand is from the Fintech industry and that of a payments app that is used for paying in a hyperlocal market. While finger printing the consumers is easy it is often hard to understand what leads to early adopter disengagement. Thanks to the subject nature of the problem there is a plethora of streaming data that can be captured in big data system such as (but not limited to) app related issues, wallet charging related issues, balance in wallet, first transaction time frame, open marketing offers etc.
Overall, this problem looking at history of consumer experiences and then behavior and building prediction to predict in the future. Therefore, if low wallet balance in the past has lead to zero transaction in more recent times, the model generalizes that current low balances increasesw the probability in the future. Since the market is hyperlocal consumer profiles such as age, gender, purchase profiles are not great predictor of variability
The goal is to build a machine learning classification model that can predict probability of churn in current customers based on behavior of attrition in the past. Meaning Probability of 'No future interaction'
Exploratory Analysis
Univariate visualization can show some interest insights. Such as, low balance (<= $5) shows higher proportion of churn, similarly consumers with > 30 of Length of Relationship tend to churn less
The bivariate jointplot shoes a ellipsoidal area with large concentration of churns. This is perhaps due to interaction of two bimodal distribution
Predicting Churn
Three machine learning models- logistic regression, random forest, xhboost for binary classification are developed with hyperparameters fitting using and f1 score is used as quality metric. The following are the ROC curves
The following are the classification metrics:
scorer | train_f1_score | train_auc_score | train_precision | train_recall | test_f1_score | test_auc_score | test_precision | test_recall | |
---|---|---|---|---|---|---|---|---|---|
0 | Logistic Regression | 0.729809 | 0.724550 | 0.716133 | 0.74405 | 0.203343 | 0.800579 | 0.118699 | 0.118699 |
1 | Random Forest | 0.990381 | 0.990283 | 0.980957 | 1.00000 | 0.556150 | 0.875905 | 0.619048 | 0.619048 |
2 | XGBoost | 0.985542 | 0.985325 | 0.971506 | 1.00000 | 0.553191 | 0.882587 | 0.492424 | 0.492424 |
The recall rate is key as this determines ability to target with retention engagement. The model to be used is determined by factors such as budget. In this case the Random Forest model ought to be used
Recall Rate
The recall rate clearly needs to be maximized in the current context. The following steps can be used as next steps:
- Data Augmentation: Increase the representation of the minority class by generating synthetic data points or using techniques like SMOTE (Synthetic Minority Over-sampling Technique).
- Feature Engineering: Carefully select and engineer features that highlight important patterns and characteristics of both classes, aiding the model's ability to distinguish between them.
- Algorithm Selection: Choose algorithms that are well-suited for imbalanced datasets, such as Random Forests, Gradient Boosting, or Support Vector Machines, as they can capture complex relationships better.
- Threshold Adjustment: Modify the classification threshold to prioritize recall over precision, particularly when false negatives are more costly than false positives.
- Ensemble Methods: Combine multiple models, such as bagging or boosting, to improve overall performance and recall by leveraging the strengths of different algorithms or subsets of the data.