Improving Customer Churn Detection Through Balanced Ensemble Learning

Authors

  • Didiek Trisatya Universitas Pancasakti Tegal
  • Priyo Haryoko Universitas Pancasakti Tegal

DOI:

https://doi.org/10.33795/jip.v12i3.9500

Keywords:

Customer Churn, Ensemble Learning, SMOTE, Imbalanced Data, Machine Learning

Abstract

Predicting customer churn represents a major challenge for telecommunication providers, driven by fierce market competition and frequent customer switching that can significantly threaten long-term revenue stability. Failure to accurately identify customers with high churn potential often leads to ineffective retention strategies. This study examines the effectiveness of integrating data balancing techniques with ensemble learning models to enhance churn prediction performance on imbalanced datasets. A quantitative experimental method is applied using a publicly available telecommunications dataset. The preprocessing phase focuses on handling incomplete records, transforming categorical attributes into numeric representations, and scaling feature values to improve data quality. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) is applied exclusively to the training data. The study evaluates three classifiers, including Logistic Regression as a baseline and two ensemble methods, Extreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM). Model performance is examined using several evaluation metrics such as accuracy, precision, recall, F1-score, and the Area Under the ROC Curve (AUC). The results reveal that ensemble learning approaches outperform Logistic Regression, particularly with respect to recall and AUC performance. LightGBM achieves the best overall performance and demonstrates stable predictive capability across all evaluation measures. Feature importance analysis reveals that customer tenure and billing-related attributes, including monthly charges and total charges, are dominant factors influencing churn behavior. These results demonstrate that integrating data balancing techniques with ensemble learning methods offers a robust and effective solution for supporting proactive customer retention initiatives in the telecommunications sector.

Downloads

Download data is not yet available.

References

Ahmad, K., & Ali, S. (2022). Explainable AI for customer churn prediction. Knowledge-Based Systems, 248, 108947.

Al-Saif, A. A., Alotaibi, S., & Alghamdi, M. (2023). LightGBM-based churn prediction framework. Journal of Big Data, 10(1), 1–22. https://doi.org/10.1186/s40537-023-00692-4

Amin, M., Rehman, M., & Khan, S. (2021). Customer churn prediction in telecommunication sector using machine learning techniques. IEEE Access, 9, 166181–166195. https://doi.org/10.1109/ACCESS.2021.3136203

Branco, M., Torgo, L., & Ribeiro, R. P. (2021). SMOTE extensions for improving churn prediction. Information Sciences, 572, 71–89.

Chen, T., et al. (2021). XGBoost: Scalable machine learning at scale. IEEE Access, 9, 101047–101057.

Chicco, D., & Jurman, G. (2021). The advantages of evaluation metrics beyond accuracy in binary classification. BMC Genomics, 22(1), 1–15. https://doi.org/10.1186/s12864-021-07461-7

Fawcett, T. (2021). An introduction to ROC analysis. Pattern Recognition Letters, 102, 21–30.

Fernández, J., et al. (2021). SMOTE for learning from imbalanced data: Progress and challenges. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1263–1280. https://doi.org/10.1109/TKDE.2019.2943420

Friedman, J., Hastie, T., & Tibshirani, R. (2021). Statistical foundations of logistic regression. Journal of Machine Learning Research, 22, 1–42.

Géron, A. (2022). Feature engineering for machine learning pipelines. ACM Computing Surveys, 54(6), 1–36.

Guidotti, R., et al. (2023). Explainable machine learning models. ACM Computing Surveys, 55(5), 1–42.

Han, J., Pei, J., & Tong, H. (2021). Data preprocessing for data mining. IEEE Access, 9, 107801–107818. https://doi.org/10.1109/ACCESS.2021.3099236

Hassan, A. B., Zulkifli, A. H., & Omar, M. S. (2022). Balancing techniques for telecom churn prediction. Journal of Big Data, 9(1), 1–19.

He, H., & Garcia, E. A. (2021). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 33(6), 2267–2284.

Ke, G., et al. (2022). LightGBM: Efficient gradient boosting. IEEE Transactions on Neural Networks and Learning Systems, 33(9), 4074–4087.

Kohavi, R., & Longbotham, R. (2021). Online experiments: Practical lessons. Computer, 50(8), 103–109.

Li, Z., Wang, Y., & Chen, X. (2021). An improved XGBoost model for customer churn prediction. Applied Soft Computing, 113, 107901. https://doi.org/10.1016/j.asoc.2021.107901

Moro, S., Cortez, P., & Rita, P. (2021). A data-driven approach to predict customer churn. Decision Support Systems, 145, 113521. https://doi.org/10.1016/j.dss.2021.113521

Powers, D. (2022). Evaluation metrics for imbalanced classification. Journal of Machine Learning Research, 23, 1–37.

Shafiq, A., et al. (2022). Machine learning-based churn prediction: A survey. IEEE Access, 10, 134567–134589.

Sokolova, M., & Lapalme, G. (2021). A systematic analysis of performance measures. Information Processing & Management, 58(5), 102610.

Sun, Y., Wong, A. K. C., & Kamel, M. S. (2022). Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 122, 108341. https://doi.org/10.1016/j.patcog.2021.108341

Ullah, A., Hussain, M., & Khan, H. A. (2022). Telecom customer churn prediction using ensemble machine learning. Expert Systems with Applications, 188, 116003. https://doi.org/10.1016/j.eswa.2021.116003

Zhang, Y., & Wu, J. (2023). Telecom churn prediction using LightGBM. Expert Systems with Applications, 209, 118229.

Zhao, L., & Liu, Y. (2024). Performance comparison of ensemble models for churn prediction. Applied Artificial Intelligence, 38(2), 145–162.

Downloads

Published

2026-05-31

How to Cite

Didiek Trisatya, & Priyo Haryoko. (2026). Improving Customer Churn Detection Through Balanced Ensemble Learning. Jurnal Informatika Polinema, 12(3), 431–438. https://doi.org/10.33795/jip.v12i3.9500