Skip to content

laizaparizotto/churn-prediction-kedro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 

Repository files navigation

Churn Prediction with Kedro Framework

This is a Kedro repository that tackles a data science challenge of predicting customer churn for a fictional financial institution. The goal is to build an effective pipeline for a production-ready Machine Learning model to forecast customer churn accurately.

To approach this problem, it was first developed EDA, feature engineering and model training and evaluation using Jupyter Notebooks. The notebooks are located in "churn-prediction-kedro/churn-prediction/notebooks/". Feel free to visit the notebooks and check my reasoning behind the solution before running the pipeline. :)

Exaploratory Data Analysis

Feature Engineering

Model Training and Evaluation

Data Understanding:

  • The first dataset, named Abandono_clientes contains 10,000 rows and 13 columns, including a target column "Exited" with binary data (1 if the customer has churned, 0 if not).
  • The second dataset, named Abandono_teste, consists of 1,000 rows and 12 columns, excluding the Exited column.

Key Concepts:

Customer Churn: Churn refers to the phenomenon of customers discontinuing their relationship with a company or service. In this context, it represents customers who have abandoned the financial institution.

Features: The dataset contains various features or attributes that provide information about the customers. Features include Row Number, Customer Id, Surname, Credit Score, Geography, Gender, Age, Tenure (duration of the customer's relationship with the bank), Balance, Number of Products Held, Has a Credit Card, Is Active Member and Estimated salary.

Exited: The target variable Exited indicates whether a customer has churned (1) or not (0).

Performance Metrics: To assess the effectiveness of the model, various evaluation metrics are used, including accuracy, precision, recall, F1-score, and AUC-ROC curve. These metrics help gauge the model's predictive capability and its ability to correctly identify customers who are likely to churn.

Getting started

Please note that this project was initially developed using Python 3.10.6 and on the Ubuntu operating system.

Clone the repository

To clone the repository and set up the development environment, follow the steps below:

  1. Clone the repository using the command:

    git clone https://github.com/laizaparizotto/churn-prediction-kedro.git
    
  2. Change to the cloned repository directory:

    cd churn-prediction-kedro
    
  3. Create a virtual environment using venv:

    python -m venv .venv
    
  4. Activate the virtual environment:

    • For Windows:
      .venv\Scripts\activate
      
    • For macOS and Linux:
      source .venv/bin/activate
      

Now you have successfully cloned the repository and set up the virtual environment. You can proceed with the next steps as described in the project documentation.

Install Kedro

To install Kedro, run: For more information, please check Kedro Installation Documentation

cd churn-prediction/
pip install kedro

Install dependencies

All necessary dependencies are located in src/requirements.txt.

To install them, run:

pip install -r src/requirements.txt

How to run the pipeline

You can run the Kedro project with:

kedro run

This will run the pipeline, which consists in data loading, preprocessing, training and evaluating RandomForestClassifier, and finally prediciting for the test set.

Final results will be stored at '/churn-prediction/data/07_model_output/resultado_teste.csv' *

Interactive Visualization

You can acess the interactive visualization with

kedro viz

The final pipeline can be seen below:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published