A Robust Ensemble Learning Framework for Pancreatic Cancer Classification Using High-Dimensional Gene Expression Data

Main Article Content

Sreeja Poduri

Abstract

Accurate classification of pancreatic cancer using gene expression data poses significant challenges due to high dimensionality, class imbalance, and limited sample sizes. This study proposes a robust machine learning framework that integrates comprehensive data preprocessing, statistical feature selection, dimensionality reduction using Principal Component Analysis (PCA), and ensemble learning techniques for improved cancer classification. A microarray dataset comprising 36 tumor and 15 standard samples across 54,675 gene features was used to evaluate the methodology. Synthetic Minority Over-sampling Technique (SMOTE) was applied to address class imbalance, and SelectKBest with ANOVA F-values was employed to extract the top 1,000 predictive features. Multiple classifiers, including Random Forest, SVM, KNN, Naïve Bayes, and Decision Tree, were evaluated individually and within ensemble models. Results show that ensemble models—particularly Voting, Stacking, and Random Forest—achieved 100% balanced accuracy and F1-scores, significantly outperforming traditional approaches, including those enhanced by Particle Swarm Optimization. The proposed methodology demonstrates strong generalization and classification capabilities, offering a promising strategy for early and accurate detection of pancreatic cancer using gene expression data

Article Details

How to Cite
[1]
Sreeja Poduri, “A Robust Ensemble Learning Framework for Pancreatic Cancer Classification Using High-Dimensional Gene Expression Data”, Int. J. Comput. Eng. Res. Trends, vol. 9, no. 12, pp. 290–303, Dec. 2022.
Section
Research Articles

References

B. Alizadeh Savareh et al., "A machine learning approach identified a diagnostic model for pancreatic cancer through using circulating microRNA signatures," Pancreatology, vol. 20, no. 6, pp. 1195–1204, Sep. 2020, doi: 10.1016/j.pan.2020.07.399.

K. Haseeb, I. Ahmad, I. I. Awan, J. Lloret, and I. Bosch, "A Machine Learning SDN-Enabled Big Data Model for IoMT Systems," Electronics, vol. 10, no. 18, p. 2228, Sep. 2021, doi: 10.3390/electronics10182228.

A. Rghioui, J. Lloret, S. Sendra, and A. Oumnad, "A Smart Architecture for Diabetic Patient Monitoring Using Machine Learning Algorithms," Healthcare, vol. 8, no. 3, p. 348, Sep. 2020, doi: 10.3390/healthcare8030348.

S. P. Menon et al., “An Intelligent Diabetic Patient Tracking System Based on Machine Learning for E-Health Applications,” Sensors, vol. 23, no. 6, p. 3004, Mar. 2023, doi: 10.3390/s23063004.

A. Rghioui, A. Naja, J. L. Mauri, and A. Oumnad, "An IoT Based diabetic patient Monitoring System Using Machine Learning and Node MCU," J. Phys.: Conf. Ser., vol. 1743, no. 1, p. 012035, Jan. 2021, doi: 10.1088/1742-6596/1743/1/012035.

B. Kenner et al., "Artificial Intelligence and Early Detection of Pancreatic Cancer: 2020 Summative Review," Pancreas, vol. 50, no. 3, pp. 251–279, Mar. 2021, doi: 10.1097/MPA.0000000000001762.

A. Rghioui, J. Lloret, and A. Oumnad, "Big Data Classification and Internet of Things in Healthcare:," International Journal of E-Health and Medical Communications, vol. 11, no. 2, pp. 20–37, Apr. 2020, doi: 10.4018/IJEHMC.2020040102.

M. R. H. Mondal, S. Bharati, and P. Podder, "Diagnosis of COVID-19 Using Machine Learning and Deep Learning: A Review," CMIR, vol. 17, no. 12, pp. 1403–1418, Dec. 2021, doi: 10.2174/1573405617666210713113439.

K. Haseeb, T. Saba, A. Rehman, I. Ahmed, and J. Lloret, "Efficient data uncertainty management for health industrial internet of things using machine learning," Int J Communication, vol. 34, no. 16, p. e4948, Nov. 2021, doi: 10.1002/dac.4948.

S. Tripathi, A. Tabari, A. Mansur, H. Dabbara, C. P. Bridge, and D. Daye, "From Machine Learning to Patient Outcomes: A Comprehensive Review of AI in Pancreatic Cancer," Diagnostics, vol. 14, no. 2, p. 174, Jan. 2024, doi: 10.3390/diagnostics14020174.

R. Alizadehsani et al., "Handling of uncertainty in medical data using machine learning and probability theory techniques: A review of 30 years (1991-2020)," 2020, arXiv. doi: 10.48550/ARXIV.2008.10114.

A. Ogunleye, C. Piyawajanusorn, G. Ghislat, and P. J. Ballester, "Large-Scale Machine Learning Analysis Reveals DNA Methylation and Gene Expression Response Signatures for Gemcitabine-Treated Pancreatic Cancer," Health Data Sci, vol. 4, p. 0108, Jan. 2024, doi: 10.34133/hds.0108.

M. Sinkala, N. Mulder, and D. Martin, "Machine Learning and Network Analyses Reveal Disease Subtypes of Pancreatic Cancer and their Molecular Characteristics," Sci Rep, vol. 10, no. 1, p. 1212, Jan. 2020, doi: 10.1038/s41598-020-58290-2.

A. Qayyum, J. Qadir, M. Bilal, and A. Al-Fuqaha, "Secure and Robust Machine Learning for Healthcare: A Survey," 2020, arXiv. doi: 10.48550/ARXIV.2001.08103