Silent Model Degradation in Clinical AI Detecting and Quantifying Undocumented Data Drift in Live EHR Systems

Main Article Content

Sreeja Poduri
Chavali Sri Gowri
Lavanya Addepalli
Jaime Lloret

Abstract

Clinical prediction models implemented within clinical healthcare are getting high stakes clinical decision support, however, they are seldom subjected to post deployment performance other than simple validation. This paper draws attention to and defines silent model degradation a failure mode where predictive validity deteriorates over time as a result of undocumented data drift failing to initiate standard system warnings. We will use a large longitudinal electronic health record (EHR) dataset in a tertiary care hospital system to show that both covariate drift and concept drift increase gradually over time following deployment and result in more gradual changes in discrimination, calibration, and prediction stability. To overcome this risk, we design CLIOPS, a combined framework of post-deployment monitoring that combines temporal drift, longitudinal performance deterioration modelling, and unsupervised early-warnings without depending on instant outcome classification. On comparative analysis, CLIOPS is less prone to operational load and detects degradation sooner and more steadily than current feature-based and label-dependent drift detection procedures. These results demonstrate that accurate performance loss leading to clinical consequences may be hidden and that safe and reliable implementation of clinical AI must be accompanied with label-free monitoring.

Article Details

How to Cite
[1]
Sreeja Poduri, Chavali Sri Gowri, Lavanya Addepalli, and Jaime Lloret, “Silent Model Degradation in Clinical AI Detecting and Quantifying Undocumented Data Drift in Live EHR Systems”, Int. J. Comput. Eng. Res. Trends, vol. 11, no. 12, pp. 42–56, Dec. 2024.
Section
Research Articles

References

J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodríguez, N. V. Chawla, and F. Herrera, “A unifying view on dataset shift in classification,” Pattern Recognition, vol. 45, no. 1, pp. 521–530, Jan. 2012, doi: 10.1016/j.patcog.2011.06.019.

R. Li et al., “Cardiovascular Disease Risk Prediction Based on Random Forest,” in Proceedings of the 2nd International Conference on Healthcare Science and Engineering, vol. 536, C. Q. Wu, M.-C. Chyu, J. Lloret, and X. Li, Eds., Singapore: Springer Singapore, 2019, pp. 31–43. doi: 10.1007/978-981-13-6837-0_3.

N. H. Shah, A. Milstein, and S. C. Bagley, PhD, “Making Machine Learning Models Clinically Useful,” JAMA, vol. 322, no. 14, p. 1351, Oct. 2019, doi: 10.1001/jama.2019.10306.

S. G. Finlayson et al., “The Clinician and Dataset Shift in Artificial Intelligence,” N Engl J Med, vol. 385, no. 3, pp. 283–286, Jul. 2021, doi: 10.1056/NEJMc2104626.

A. Soin et al., “CheXstray: Real-time Multi-Modal Data Concordance for Drift Detection in Medical Imaging AI,” 2022, arXiv. doi: 10.48550/ARXIV.2202.02833.

Z. Young and R. Steele, “Empirical evaluation of performance degradation of machine learning-based predictive models – A case study in healthcare information systems,” International Journal of Information Management Data Insights, vol. 2, no. 1, p. 100070, Apr. 2022, doi: 10.1016/j.jjimei.2022.100070.

H. Q. Nguyen et al., “VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations,” Sci Data, vol. 9, no. 1, p. 429, Jul. 2022, doi: 10.1038/s41597-022-01498-w.

S. E. Davis, C. G. Walsh, and M. E. Matheny, “Open questions and research gaps for monitoring and updating AI-enabled tools in clinical settings,” Front. Digit. Health, vol. 4, p. 958284, Sep. 2022, doi: 10.3389/fdgth.2022.958284.

K. Rahmani et al., “Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction,” International Journal of Medical Informatics, vol. 173, p. 104930, May 2023, doi: 10.1016/j.ijmedinf.2022.104930.

F. Di Martino and F. Delmastro, “Explainable AI for clinical and remote health applications: a survey on tabular and time series data,” Artif Intell Rev, vol. 56, no. 6, pp. 5261–5315, Jun. 2023, doi: 10.1007/s10462-022-10304-3.

A. R. M. S., N. C. R., S. B. R., H. Lahza, and H. F. M. Lahza, “A survey on detecting healthcare concept drift in AI/ML models from a finance perspective,” Front. Artif. Intell., vol. 5, p. 955314, Apr. 2023, doi: 10.3389/frai.2022.955314.

S. P. Shashikumar, F. Amrollahi, and S. Nemati, “Unsupervised Detection and Correction of Model Calibration Shift at Test-Time,” in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia: IEEE, Jul. 2023, pp. 1–4. doi: 10.1109/EMBC40787.2023.10341086.

N. A. of Medicine, T. L. H. S. Series, D. Whicher, M. Ahmed, S. T. Israni, and M. Matheny, “DEPLOYING ARTIFICIAL INTELLIGENCE IN CLINICAL SETTINGS,” in Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril, National Academies Press (US), 2023. Accessed: Feb. 01, 2026. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK605954/

B. Sahiner, W. Chen, R. K. Samala, and N. Petrick, “Data drift in medical machine learning: implications and potential remedies,” The British Journal of Radiology, vol. 96, no. 1150, p. 20220878, Oct. 2023, doi: 10.1259/bjr.20220878.

A. Rajagopal et al., “Machine Learning Operations in Health Care: A Scoping Review,” Mayo Clinic Proceedings: Digital Health, vol. 2, no. 3, pp. 421–437, Sep. 2024, doi: 10.1016/j.mcpdig.2024.06.009.