Optimization of Heart Failure Classification on Imbalanced Data Using a Supervised Learning Approach Based on Logistic Regression, Random Forest, and K-Nearest Neighbor
Optimalisasi Klasifikasi Gagal Jantung pada Data Imbalanced Menggunakan Pendekatan Supervised Learning Berbasis Regresi Logistik, Random Forest, dan K-Nearest Neighbor
DOI:
https://doi.org/10.33795/jip.v12i1.9071Keywords:
Class Imbalance, Heart Failure, K-Nearest Neighbor, Logistic Regression, Random Forest, SMOTEAbstract
Heart failure remains one of the leading causes of mortality worldwide, posing significant challenges for early diagnosis and patient management. One of the major obstacles in developing predictive models for heart failure is the class imbalance problem, where the number of surviving patients far exceeds those who experience death events. This imbalance often leads machine learning algorithms to bias toward the majority class, reducing sensitivity to critical minority cases. To address this issue, this study applies the Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset and improve model performance. Three supervised learning algorithms, namely Logistic Regression (LR), Random Forest (RF), and K-Nearest Neighbor (KNN), were implemented and compared on the UCI Heart Failure Clinical Records dataset containing 299 patient samples with 13 clinical attributes. Experimental results show that the Random Forest model achieved the highest performance with 90% accuracy, precision, recall, and F1-score, outperforming both LR and KNN. The findings demonstrate that combining data balancing with ensemble learning effectively enhances prediction accuracy and sensitivity toward minority classes. The main contribution of this research lies in optimizing supervised models for medical data with skewed class distributions, providing a more reliable and interpretable approach for early heart failure detection. Future research may extend this work by integrating advanced ensemble or hybrid deep learning models and expanding the dataset for multi-institutional validation
Downloads
References
Ab Wahab, M. N., Nazir, A., Ren, A. T. Z., Noor, M. H. M., Akbar, M. F., & Mohamed, A. S. A. (2021). Efficientnet-Lite and Hybrid CNN-KNN Implementation for Facial Expression Recognition on Raspberry Pi. IEEE Access, 9, 134065–134080. https://doi.org/10.1109/ACCESS.2021.3113337
Adi Pratama, F. R., & Oktora, S. I. (2023). Synthetic Minority Over-sampling Technique (SMOTE) for handling imbalanced data in poverty classification. Statistical Journal of the IAOS, 39(1), 233–239. https://doi.org/10.3233/SJI-220080
Albert, A. J., Murugan, R., & Sripriya, T. (2022). Diagnosis of heart disease using oversampling methods and decision tree classifier in cardiology. Research on Biomedical Engineering, 39(1), 99–113. https://doi.org/10.1007/s42600-022-00253-9
Al-Ghiffary, M. M. I., Cahyo, N. R. D., Rachmawanto, E. H., Irawan, C., & Hendriyanto, N. (2024). Adaptive deep learning based on FaceNet convolutional neural network for facial expression recognition. Journal of Soft Computing, 05(03), 271–280. https://doi.org/https://doi.org/10.52465/joscex.v5i3.450
Amirruddin, A. D., Muharam, F. M., Ismail, M. H., Tan, N. P., & Ismail, M. F. (2022). Synthetic Minority Over-sampling TEchnique (SMOTE) and Logistic Model Tree (LMT)-Adaptive Boosting algorithms for classifying imbalanced datasets of nutrient and chlorophyll sufficiency levels of oil palm (Elaeis guineensis) using spectroradiometers and unmanned aerial vehicles. Computers and Electronics in Agriculture, 193, 106646. https://doi.org/10.1016/j.compag.2021.106646
Basha, S. J., Madala, S. R., Vivek, K., Kumar, E. S., & Ammannamma, T. (2022). A Review on Imbalanced Data Classification Techniques. 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA), 1–6. https://doi.org/10.1109/ICACTA54488.2022.9753392
Bhatt, C. M., Patel, P., Ghetia, T., & Mazzeo, P. L. (2023). Effective Heart Disease Prediction Using Machine Learning Techniques. Algorithms, 16(2). https://doi.org/10.3390/a16020088
Cahyo, N. R. D., Sari, C. A., Rachmawanto, E. H., Jatmoko, C., Al-Jawry, R. R. A., & Alkhafaji, M. A. (2023). A Comparison of Multi Class Support Vector Machine vs Deep Convolutional Neural Network for Brain Tumor Classification. 2023 International Seminar on Application for Technology of Information and Communication (ISemantic), 12(2), 358–363. https://doi.org/10.1109/iSemantic59612.2023.10295336
Chandrasekhar, N., & Peddakrishna, S. (2023). Enhancing Heart Disease Prediction Accuracy through Machine Learning Techniques and Optimization. Processes, 11(4). https://doi.org/10.3390/pr11041210
Chicco, D., Tötsch, N., & Jurman, G. (2021). The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining, 14(1), 13. https://doi.org/10.1186/s13040-021-00244-z
Daviran, M., Shamekhi, M., Ghezelbash, R., & Maghsoudi, A. (2023). Landslide susceptibility prediction using artificial neural networks, SVMs and random forest: hyperparameters tuning by genetic optimization algorithm. International Journal of Environmental Science and Technology, 20(1), 259–276. https://doi.org/10.1007/s13762-022-04491-3
Fan, C.-L. (2025). Evaluation Model for Crack Detection with Deep Learning: Improved Confusion Matrix Based on Linear Features. Journal of Construction Engineering and Management, 151(3). https://doi.org/10.1061/JCEMD4.COENG-14976
Farhan, F., Sari, C. A., Rachmawanto, E. H., & Cahyo, N. R. D. (2023). Mangrove Tree Species Classification Based on Leaf, Stem, and Seed Characteristics Using Convolutional Neural Networks with K-Folds Cross Validation Optimalization. Advance Sustainable Science Engineering and Technology, 5(3), 02303011. https://doi.org/10.26877/asset.v5i3.17188
Hasanah, U., Soleh, A. M., & Sadik, K. (2024). Effect of Random Under sampling, Oversampling, and SMOTE on the Performance of Cardiovascular Disease Prediction Models. Jurnal Matematika, Statistika Dan Komputasi, 21(1), 88–102. https://doi.org/10.20956/j.v21i1.35552
Irawan, C., Winarno, A., Kusumodestoni, H., Sucipto, A., Tamrin, T., & Doheir, M. (2021). A Combination of Statistical Extraction and Texture Features Based on KNN for Batik Classification. 2021 International Seminar on Application for Technology of Information and Communication (ISemantic), 113–117. https://doi.org/10.1109/iSemantic52711.2021.9573214
Jaddoa, A. S. (2023). Heart disease prediction system using (SMOTE technique) balanced dataset and decision tree classifier. 050006. https://doi.org/10.1063/5.0161558
Kamila, I. P., Sari, C. A., Rachmawanto, E. H., & Cahyo, N. R. D. (2023). A Good Evaluation Based on Confusion Matrix for Lung Diseases Classification using Convolutional Neural Networks. Advance Sustainable Science, Engineering and Technology, 6(1), 0240102. https://doi.org/10.26877/asset.v6i1.17330
Li, D., Fu, J., Zhao, J., Qin, J., & Zhang, L. (2023). A deep learning system for heart failure mortality prediction. PLOS ONE, 18(2), e0276835. https://doi.org/10.1371/journal.pone.0276835
Moreno-Sánchez, P. A. (2023). Improvement of a prediction model for heart failure survival through explainable artificial intelligence. Frontiers in Cardiovascular Medicine, 10. https://doi.org/10.3389/fcvm.2023.1219586
Muzakki, M. F., Prayogo, R. D., & Rizky A, M. A. (2023). Handling Imbalanced Data for Acute Coronary Syndrome Classification Based on Ensemble and K-Means SMOTE Method. JOIV : International Journal on Informatics Visualization, 7(3–2), 1989. https://doi.org/10.30630/joiv.7.3-2.1429
Palanivinayagam, A., & Damaševičius, R. (2023). Effective Handling of Missing Values in Datasets for Classification Using Machine Learning Methods. Information, 14(2), 92. https://doi.org/10.3390/info14020092
Rasyidi, M. A., Bariyah, T., Riskajaya, Y. I., & Septyani, A. D. (2021). Classification of handwritten javanese script using random forest algorithm. Bulletin of Electrical Engineering and Informatics, 10(3), 1308–1315. https://doi.org/10.11591/eei.v10i3.3036
Sabouri, M., Rajabi, A. B., Hajianfar, G., Gharibi, O., Mohebi, M., Avval, A. H., Naderi, N., & Shiri, I. (2023). Machine learning based readmission and mortality prediction in heart failure patients. Scientific Reports, 13(1), 18671. https://doi.org/10.1038/s41598-023-45925-3
Šinkovec, H., Heinze, G., Blagus, R., & Geroldinger, A. (2021). To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets. BMC Medical Research Methodology, 21(1), 199. https://doi.org/10.1186/s12874-021-01374-y
Teodorescu, V., & Obreja Brașoveanu, L. (2025). Assessing the Validity of k-Fold Cross-Validation for Model Selection: Evidence from Bankruptcy Prediction Using Random Forest and XGBoost. Computation, 13(5), 127. https://doi.org/10.3390/computation13050127
World Health Organization. (2024, September 29). World Heart Day: Cardiovascular diseases claim 3.9 million lives in the WHO South-East Asia Region every year. https://www.who.int/southeastasia/news/detail/29-09-2024-world-heart-day






