Telecom Customer Churn Predictor

PythonXGBoostScikit-learnPandasNumPyMLflowFastAPIGradioGreat ExpectationsMatplotlib

✦ Project Overview

A production-grade, end-to-end MLOps system that predicts which telecom customers are at risk of churning — before they leave. Built with XGBoost, MLflow, FastAPI, and Gradio, the pipeline spans automated data validation, feature engineering, experiment tracking, and a live web interface; giving retention teams a real-time decision tool backed by rigorous ML engineering.

✦ Key Features

♥Production-grade MLOps pipeline with automated data validation — all 23/23 Great Expectations checks passed — covering schema integrity, business-rule constraints, and statistical range checks before any model sees the data.
♥XGBoost classifier trained on 7,043 customer records with deliberate class-imbalance handling (scale_pos_weight), achieving ROC-AUC of 0.837 and 82.1% recall; serialized with MLflow PyFunc for zero-drift serving.
♥FastAPI inference service with a /predict REST endpoint delivering 0.004 s latency and 370K+ samples/sec throughput, paired with a Gradio UI so non-technical stakeholders can run predictions without writing code.
♥Feature engineering pipeline that expands 21 raw attributes to 31 production-consistent features, with column-order persistence to prevent train/serve skew across environments.
♥Full MLflow experiment tracking covering precision, recall, ROC-AUC, training time, prediction latency, and validation results — ensuring complete reproducibility of every experiment run.

✦ Methodology

A structured MLOps workflow that treats every stage — from raw CSV to deployed API — as a production concern, not an afterthought:

01.

Automated Data Validation

Before any preprocessing begins, Great Expectations runs 23 checks across schema, business rules (valid contract types, binary fields), and statistical ranges (tenure, monthly charges). The pipeline halts on any failure, guaranteeing only clean data reaches downstream stages.

02.

Feature Engineering & Preprocessing

Deterministic binary mappings and one-hot encoding (pd.get_dummies with drop_first) expand the feature space from 21 to 31 columns. The final column order is persisted as an artifact so inference-time inputs are always aligned with the training schema.

03.

Model Training & Experiment Tracking

XGBoost is trained with scale_pos_weight to address the ~26% churn-minority imbalance. Every run logs metrics (ROC-AUC 0.837, Recall 0.821, F1 0.614), timing stats, and model artifacts to MLflow, making results fully auditable and reproducible.

04.

Production Serving & UI

FastAPI exposes a /predict endpoint with sub-5 ms inference; Gradio provides a dark-themed, dropdown-driven web interface for business users. Both layers are Docker-compatible for straightforward cloud deployment.