Heart Disease Prediction Using Machine Learning with Recursive Feature Elimination for Optimized Performance
Main Article Content
Abstract
Heart disease remains one of the leading causes of mortality worldwide, emphasizing the need for early diagnosis and effective predictive models to improve patient outcomes. This research focuses on developing a machine learning-based predictive model for heart disease detection by integrating Recursive Feature Elimination (RFE) as a feature selection technique. The primary objective is to enhance model performance, interpretability, and computational efficiency by eliminating irrelevant and redundant features while retaining the most significant predictors. The study evaluates three machine learning algorithms—Logistic Regression, Random Forest, and Support Vector Machine (SVM)—with and without RFE to assess the impact of feature selection on performance. The methodology involves data preprocessing, including normalization and scaling, followed by RFE-based feature selection and model evaluation using key metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. Experimental results reveal that the Random Forest model achieves the highest performance, with 99% accuracy and a 1.00 ROC-AUC, making it the most reliable model for predictive tasks. However, the RFE-based Logistic Regression model provides better interpretability and reduced complexity, albeit with slightly lower performance metrics. The findings highlight the effectiveness of RFE in optimizing feature selection while validating the trade-offs between accuracy and model transparency. This research contributes to the growing field of healthcare analytics by demonstrating the feasibility of using feature selection techniques for building interpretable, scalable, and accurate models for heart disease prediction. Future work will focus on expanding the dataset, incorporating temporal data, and exploring hybrid models to further improve predictive performance and clinical applicability
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
World Health Organization, "Cardiovascular diseases (CVDs)," WHO, 2021. [Online]. Available: https://www.who.int. [Accessed: Dec. 12, 2023].
A. Chaurasia and S. Pal, "Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability," International Journal of Computer Science and Mobile Computing (IJCSMC), vol. 3, no. 1, pp. 10–22, 2014.
S. Khourdifi and M. Bahaj, "Heart Disease Prediction and Classification Using Machine Learning Algorithms Optimized by Particle Swarm Optimization and Ant Colony Optimization," International Journal of Intelligent Engineering and Systems, vol. 12, no. 1, pp. 242–252, 2019.
M. Alizadehsani, J. Habibi, R. Zahiri Esfahani, and A. S. Roosta, "Diagnosing Coronary Artery Disease via Data Mining Algorithms by Considering Laboratory and Echocardiography Features," Research in Cardiovascular Medicine, vol. 2, no. 3, pp. 133–139, 2013.
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, no. 1, pp. 389–422, 2002.
L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
W. Chen, L. Liu, and Y. Yang, "Early Prediction of Heart Disease Using Machine Learning Techniques," IEEE International Conference on Big Data Analysis (ICBDA), pp. 312–316, 2018.
T. Thilagavathi, A. Tamilselvi, and R. M. Periasamy, "Comparative Analysis of Machine Learning Techniques for Early Detection of Heart Diseases," International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), vol. 7, no. 6, pp. 1–7, 2017.
A. Paul and N. Kumar, "A Study on Heart Disease Prediction Using Feature Selection and Classification Methods," Journal of Emerging Technologies and Innovative Research (JETIR), vol. 7, no. 4, pp. 301–307, 2020.
P. Dua and D. Graff, "UCI Machine Learning Repository: Heart Disease Dataset," University of California, Irvine, 2017. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
P. Dua and D. Graff, "UCI Machine Learning Repository: Heart Disease Dataset," University of California, Irvine, 2017. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Heart+Disease.
S. Khourdifi and M. Bahaj, "Heart Disease Prediction and Classification Using Machine Learning Algorithms Optimized by Particle Swarm Optimization and Ant Colony Optimization," International Journal of Intelligent Engineering and Systems, vol. 12, no. 1, pp. 242–252, 2019.
M. Alizadehsani, J. Habibi, R. Zahiri Esfahani, and A. S. Roosta, "Diagnosing Coronary Artery Disease via Data Mining Algorithms by Considering Laboratory and Echocardiography Features," Research in Cardiovascular Medicine, vol. 2, no. 3, pp. 133–139, 2013.
A. Chaurasia and S. Pal, "Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability," International Journal of Computer Science and Mobile Computing (IJCSMC), vol. 3, no. 1, pp. 10–22, 2014.
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, no. 1, pp. 389–422, 2002.
A. Paul and N. Kumar, "A Study on Heart Disease Prediction Using Feature Selection and Classification Methods," Journal of Emerging Technologies and Innovative Research (JETIR), vol. 7, no. 4, pp. 301–307, 2020.
T. Thilagavathi, A. Tamilselvi, and R. M. Periasamy, "Comparative Analysis of Machine Learning Techniques for Early Detection of Heart Diseases," International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), vol. 7, no. 6, pp. 1–7, 2017.
W. Chen, L. Liu, and Y. Yang, "Early Prediction of Heart Disease Using Machine Learning Techniques," IEEE International Conference on Big Data Analysis (ICBDA), pp. 312–316, 2018.
L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
H. Zhang, Y. Zhou, and X. Liu, "A Comprehensive Study of Feature Selection Methods for Heart Disease Prediction," IEEE Transactions on Biomedical Engineering, vol. 67, no. 4, pp. 1122–1131, 2020.
D. W. Hosmer and S. Lemeshow, Applied Logistic Regression. New York, NY, USA: John Wiley & Sons, 2000.
L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
C. Cortes and V. Vapnik, "Support-Vector Networks," Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, "Gene Selection for Cancer Classification Using Support Vector Machines," Machine Learning, vol. 46, no. 1, pp. 389–422, 2002