PROJECT SUMMARY

NAME

Customer Churn Prediction - Machine Learning Project

CONTEXT

In the competitive telecom industry, retaining customers is crucial for maintaining revenue and growth. Customer churn, where customers discontinue their service, poses a significant challenge. By predicting churn, companies can take proactive measures to retain customers and improve overall satisfaction.

GOAL

Develop a machine learning model to predict whether a customer will churn, enabling the company to implement targeted retention strategies.

DATA

1 CSV file (7043, 21): Customer-Churn.csv

TECHNIQUES AND LIBRARIES USED:

Data Acquisition: CSV import with Pandas
Data Analytics: Exploratory data analysis to understand patterns and relationships
Data Visualization: Seaborn, Matplotlib and Plotply
Data Preprocessing: Cleaning, encoding, and scaling with Scikit Learn, Numpy and Scipy
Data Engineering: Feature creation, selection, and imputation
Modeling: Logistic Regression, SVC, Random Forest, and ensemble methods (voting, stacking, boosting) with Imbalanced-Learn and Scikit Learn.
Evaluation: Recall, precision, F1-score, cross-validation, confusion matrix, and learning curve
Tuning: GridSearchCV/RandomizedSearchCV for parameter optimization
Deployment: Model saving with joblib

METHODOLOGY

Data Acquisition and Preparation:

Import Data: Loaded dataset using Pandas.
Data Cleaning: Handled missing values, corrected data types, and dropped unnecessary columns.

Exploratory Data Analysis (EDA):

Descriptive Statistics: Calculated summary statistics.
Visualization: Used seaborn and matplotlib for histograms, bar plots, and correlation heatmaps.
Target Variable Analysis: Examined churn distribution and identified class imbalance.

Data Preprocessing:

Encoding Categorical Variables: One-hot and target encoding.
Scaling: Standardization/normalization of continuous variables.

Modeling:

Model Selection: Choose several classification models suitable for the task (Handling Imbalance).
Initial Training: Baseline performance with default parameters.
Evaluation Metrics: Prioritized recall, also considered precision, F1-score, and accuracy.
Parameter Tuning: Grid search and random search.
Cross-Validation: Ensured model generalizability and prevented overfitting.

Feature Engineering

Feature Selection: Evaluated feature importance.
Feature Creation/Deletion: Simplified the model with new and selected features.

Model Deployment:

Final Model Selection: Based on recall and overall performance and tradeoff.
Saving the Model: Using joblib.
Wrapper Class: Facilitated easy deployment and integration.

Model & Business Recommendations:

Insights and Strategies: Derived actionable insights and proposed strategies to mitigate churn and improve model accuracy.
Implementation Plan: A plan for targeted interventions was suggested.

DATA ANALYSIS

Observations

Churning Rate: 26.5%
Imbalance in Target Class: 26.5% churn vs 73.5% non-churn.
Variable Types: 17 categoricals, 3 numericals.
Influential Factors:
- Demographics: 'SeniorCitizen', 'Partner', 'Dependents'
- Services: 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport'
- Account: 'Contract', 'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'tenure'

Variable Relationships with Churn:

Categorical variables:

Demographic Details:
- 'SeniorCitizen', 'Partner', 'Dependents': Influence churn rates (e.g., seniors, singles, and independents have higher churn rates).
Services:
- 'InternetService': High influence for Fiber optic with a churn rate of 69.4%.
- Optional Internet Services: decrease churn among internet customers.
Account Details:
- 'Contract': Month-to-month contracts see the highest churn rate (88.6%).
- 'PaperlessBilling': Increases churn (74.9%).
- 'PaymentMethod': Electronic check payments have the highest churn rate (57.3%).

Continuous Variables:

'Total Charges': Lower charges, higher churn (new customers).
'Tenure': Shorter tenure, higher churn. 75% of churn occurs before 30 months(new customers).
'Monthly Charges': Higher charges, higher churn. Notable increase in churn probability around $65/month.

Correlation Analysis:

Internet Services: Strong correlation with optional internet services (+0.61).
Phone Services: Strong correlation with MultipleLines and PhoneService (+0.61).

CONCLUSION

Possible Solutions to Improve Model Performance

More Quality Data: Collect additional customer interaction data.
Feature Engineering: Create new features.
Data Augmentation: Generate synthetic data using SMOTE.
Ensemble Methods: Combine predictions from multiple models.
Dimensionality Reduction: Apply Principal Component Analysis (PCA).
Regularization: Use L1/L2 regularization.
Temporal Analysis: Incorporate time-based features.

Possible Business Solutions to Improve Customer Churn

Proactive Customer Management

Procedures and warning for high-risk clients
Early intervention programs

Incentives and Rewards

New customer incentives
High-paying customer incentives
Quick surveys with incentives
Loyalty programs

Customer Engagement and Support

Engage unhappy customers
Enhanced customer support
Personalized communication
Customer feedback loop

Service and Product Improvement

Service Quality Improvement
Customer Education

Data-Driven Strategies

Predictive Analytics

Flexibility and Customization

Flexible Contract Options
Tailored Offers and Promotions

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
1. Exploratory Data Analysis.ipynb		1. Exploratory Data Analysis.ipynb
2. Preprocessing & Modeling.ipynb		2. Preprocessing & Modeling.ipynb
3. Final Model Deployment.ipynb		3. Final Model Deployment.ipynb
Customer-Churn.csv		Customer-Churn.csv
README.md		README.md
X_test.pkl		X_test.pkl
logistic_regression_model.pkl		logistic_regression_model.pkl
raw_data.csv		raw_data.csv
y_test.pkl		y_test.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PROJECT SUMMARY

NAME

CONTEXT

GOAL

DATA

TECHNIQUES AND LIBRARIES USED:

METHODOLOGY

Data Acquisition and Preparation:

Exploratory Data Analysis (EDA):

Data Preprocessing:

Modeling:

Feature Engineering

Model Deployment:

Model & Business Recommendations:

DATA ANALYSIS

Observations

Variable Relationships with Churn:

Correlation Analysis:

CONCLUSION

Possible Solutions to Improve Model Performance

Possible Business Solutions to Improve Customer Churn

Proactive Customer Management

Incentives and Rewards

Customer Engagement and Support

Service and Product Improvement

Data-Driven Strategies

Flexibility and Customization

About

Releases

Packages

Languages

RomainD91/Project-ML-Customer-Churn-Prediction

Folders and files

Latest commit

History

Repository files navigation

PROJECT SUMMARY

NAME

CONTEXT

GOAL

DATA

TECHNIQUES AND LIBRARIES USED:

METHODOLOGY

Data Acquisition and Preparation:

Exploratory Data Analysis (EDA):

Data Preprocessing:

Modeling:

Feature Engineering

Model Deployment:

Model & Business Recommendations:

DATA ANALYSIS

Observations

Variable Relationships with Churn:

Correlation Analysis:

CONCLUSION

Possible Solutions to Improve Model Performance

Possible Business Solutions to Improve Customer Churn

Proactive Customer Management

Incentives and Rewards

Customer Engagement and Support

Service and Product Improvement

Data-Driven Strategies

Flexibility and Customization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages