A Robust Ensemble Learning Framework for Pancreatic Cancer Classification Using High-Dimensional Gene Expression Data
Main Article Content
Abstract
Accurate classification of pancreatic cancer using gene expression data poses significant challenges due to high dimensionality, class imbalance, and limited sample sizes. This study proposes a robust machine learning framework that integrates comprehensive data preprocessing, statistical feature selection, dimensionality reduction using Principal Component Analysis (PCA), and ensemble learning techniques for improved cancer classification. A microarray dataset comprising 36 tumor and 15 standard samples across 54,675 gene features was used to evaluate the methodology. Synthetic Minority Over-sampling Technique (SMOTE) was applied to address class imbalance, and SelectKBest with ANOVA F-values was employed to extract the top 1,000 predictive features. Multiple classifiers, including Random Forest, SVM, KNN, Naïve Bayes, and Decision Tree, were evaluated individually and within ensemble models. Results show that ensemble models—particularly Voting, Stacking, and Random Forest—achieved 100% balanced accuracy and F1-scores, significantly outperforming traditional approaches, including those enhanced by Particle Swarm Optimization. The proposed methodology demonstrates strong generalization and classification capabilities, offering a promising strategy for early and accurate detection of pancreatic cancer using gene expression data
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
B. Alizadeh Savareh et al., "A machine learning approach identified a diagnostic model for pancreatic cancer through using circulating microRNA signatures," Pancreatology, vol. 20, no. 6, pp. 1195–1204, Sep. 2020, doi: 10.1016/j.pan.2020.07.399.
K. Haseeb, I. Ahmad, I. I. Awan, J. Lloret, and I. Bosch, "A Machine Learning SDN-Enabled Big Data Model for IoMT Systems," Electronics, vol. 10, no. 18, p. 2228, Sep. 2021, doi: 10.3390/electronics10182228.
A. Rghioui, J. Lloret, S. Sendra, and A. Oumnad, "A Smart Architecture for Diabetic Patient Monitoring Using Machine Learning Algorithms," Healthcare, vol. 8, no. 3, p. 348, Sep. 2020, doi: 10.3390/healthcare8030348.
S. P. Menon et al., “An Intelligent Diabetic Patient Tracking System Based on Machine Learning for E-Health Applications,” Sensors, vol. 23, no. 6, p. 3004, Mar. 2023, doi: 10.3390/s23063004.
A. Rghioui, A. Naja, J. L. Mauri, and A. Oumnad, "An IoT Based diabetic patient Monitoring System Using Machine Learning and Node MCU," J. Phys.: Conf. Ser., vol. 1743, no. 1, p. 012035, Jan. 2021, doi: 10.1088/1742-6596/1743/1/012035.
B. Kenner et al., "Artificial Intelligence and Early Detection of Pancreatic Cancer: 2020 Summative Review," Pancreas, vol. 50, no. 3, pp. 251–279, Mar. 2021, doi: 10.1097/MPA.0000000000001762.
A. Rghioui, J. Lloret, and A. Oumnad, "Big Data Classification and Internet of Things in Healthcare:," International Journal of E-Health and Medical Communications, vol. 11, no. 2, pp. 20–37, Apr. 2020, doi: 10.4018/IJEHMC.2020040102.
M. R. H. Mondal, S. Bharati, and P. Podder, "Diagnosis of COVID-19 Using Machine Learning and Deep Learning: A Review," CMIR, vol. 17, no. 12, pp. 1403–1418, Dec. 2021, doi: 10.2174/1573405617666210713113439.
K. Haseeb, T. Saba, A. Rehman, I. Ahmed, and J. Lloret, "Efficient data uncertainty management for health industrial internet of things using machine learning," Int J Communication, vol. 34, no. 16, p. e4948, Nov. 2021, doi: 10.1002/dac.4948.
S. Tripathi, A. Tabari, A. Mansur, H. Dabbara, C. P. Bridge, and D. Daye, "From Machine Learning to Patient Outcomes: A Comprehensive Review of AI in Pancreatic Cancer," Diagnostics, vol. 14, no. 2, p. 174, Jan. 2024, doi: 10.3390/diagnostics14020174.
R. Alizadehsani et al., "Handling of uncertainty in medical data using machine learning and probability theory techniques: A review of 30 years (1991-2020)," 2020, arXiv. doi: 10.48550/ARXIV.2008.10114.
A. Ogunleye, C. Piyawajanusorn, G. Ghislat, and P. J. Ballester, "Large-Scale Machine Learning Analysis Reveals DNA Methylation and Gene Expression Response Signatures for Gemcitabine-Treated Pancreatic Cancer," Health Data Sci, vol. 4, p. 0108, Jan. 2024, doi: 10.34133/hds.0108.
M. Sinkala, N. Mulder, and D. Martin, "Machine Learning and Network Analyses Reveal Disease Subtypes of Pancreatic Cancer and their Molecular Characteristics," Sci Rep, vol. 10, no. 1, p. 1212, Jan. 2020, doi: 10.1038/s41598-020-58290-2.
A. Qayyum, J. Qadir, M. Bilal, and A. Al-Fuqaha, "Secure and Robust Machine Learning for Healthcare: A Survey," 2020, arXiv. doi: 10.48550/ARXIV.2001.08103