Silent Model Degradation in Clinical AI Detecting and Quantifying Undocumented Data Drift in Live EHR Systems
Main Article Content
Abstract
Clinical prediction models implemented within clinical healthcare are getting high stakes clinical decision support, however, they are seldom subjected to post deployment performance other than simple validation. This paper draws attention to and defines silent model degradation a failure mode where predictive validity deteriorates over time as a result of undocumented data drift failing to initiate standard system warnings. We will use a large longitudinal electronic health record (EHR) dataset in a tertiary care hospital system to show that both covariate drift and concept drift increase gradually over time following deployment and result in more gradual changes in discrimination, calibration, and prediction stability. To overcome this risk, we design CLIOPS, a combined framework of post-deployment monitoring that combines temporal drift, longitudinal performance deterioration modelling, and unsupervised early-warnings without depending on instant outcome classification. On comparative analysis, CLIOPS is less prone to operational load and detects degradation sooner and more steadily than current feature-based and label-dependent drift detection procedures. These results demonstrate that accurate performance loss leading to clinical consequences may be hidden and that safe and reliable implementation of clinical AI must be accompanied with label-free monitoring.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodríguez, N. V. Chawla, and F. Herrera, “A unifying view on dataset shift in classification,” Pattern Recognition, vol. 45, no. 1, pp. 521–530, Jan. 2012, doi: 10.1016/j.patcog.2011.06.019.
R. Li et al., “Cardiovascular Disease Risk Prediction Based on Random Forest,” in Proceedings of the 2nd International Conference on Healthcare Science and Engineering, vol. 536, C. Q. Wu, M.-C. Chyu, J. Lloret, and X. Li, Eds., Singapore: Springer Singapore, 2019, pp. 31–43. doi: 10.1007/978-981-13-6837-0_3.
N. H. Shah, A. Milstein, and S. C. Bagley, PhD, “Making Machine Learning Models Clinically Useful,” JAMA, vol. 322, no. 14, p. 1351, Oct. 2019, doi: 10.1001/jama.2019.10306.
S. G. Finlayson et al., “The Clinician and Dataset Shift in Artificial Intelligence,” N Engl J Med, vol. 385, no. 3, pp. 283–286, Jul. 2021, doi: 10.1056/NEJMc2104626.
A. Soin et al., “CheXstray: Real-time Multi-Modal Data Concordance for Drift Detection in Medical Imaging AI,” 2022, arXiv. doi: 10.48550/ARXIV.2202.02833.
Z. Young and R. Steele, “Empirical evaluation of performance degradation of machine learning-based predictive models – A case study in healthcare information systems,” International Journal of Information Management Data Insights, vol. 2, no. 1, p. 100070, Apr. 2022, doi: 10.1016/j.jjimei.2022.100070.
H. Q. Nguyen et al., “VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations,” Sci Data, vol. 9, no. 1, p. 429, Jul. 2022, doi: 10.1038/s41597-022-01498-w.
S. E. Davis, C. G. Walsh, and M. E. Matheny, “Open questions and research gaps for monitoring and updating AI-enabled tools in clinical settings,” Front. Digit. Health, vol. 4, p. 958284, Sep. 2022, doi: 10.3389/fdgth.2022.958284.
K. Rahmani et al., “Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction,” International Journal of Medical Informatics, vol. 173, p. 104930, May 2023, doi: 10.1016/j.ijmedinf.2022.104930.
F. Di Martino and F. Delmastro, “Explainable AI for clinical and remote health applications: a survey on tabular and time series data,” Artif Intell Rev, vol. 56, no. 6, pp. 5261–5315, Jun. 2023, doi: 10.1007/s10462-022-10304-3.
A. R. M. S., N. C. R., S. B. R., H. Lahza, and H. F. M. Lahza, “A survey on detecting healthcare concept drift in AI/ML models from a finance perspective,” Front. Artif. Intell., vol. 5, p. 955314, Apr. 2023, doi: 10.3389/frai.2022.955314.
S. P. Shashikumar, F. Amrollahi, and S. Nemati, “Unsupervised Detection and Correction of Model Calibration Shift at Test-Time,” in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia: IEEE, Jul. 2023, pp. 1–4. doi: 10.1109/EMBC40787.2023.10341086.
N. A. of Medicine, T. L. H. S. Series, D. Whicher, M. Ahmed, S. T. Israni, and M. Matheny, “DEPLOYING ARTIFICIAL INTELLIGENCE IN CLINICAL SETTINGS,” in Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril, National Academies Press (US), 2023. Accessed: Feb. 01, 2026. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK605954/
B. Sahiner, W. Chen, R. K. Samala, and N. Petrick, “Data drift in medical machine learning: implications and potential remedies,” The British Journal of Radiology, vol. 96, no. 1150, p. 20220878, Oct. 2023, doi: 10.1259/bjr.20220878.
A. Rajagopal et al., “Machine Learning Operations in Health Care: A Scoping Review,” Mayo Clinic Proceedings: Digital Health, vol. 2, no. 3, pp. 421–437, Sep. 2024, doi: 10.1016/j.mcpdig.2024.06.009.