A Deep Dive into Code Smell and Vulnerability Using Machine Learning and Deep Learning Techniques

Kritika

doi:10.22362/ijcert/2024/v11/i4/v11i404

PDF

Published: Apr 30, 2024

DOI: https://doi.org/10.22362/ijcert/2024/v11/i4/v11i404

Keywords:

Code smell, Vulnerability, Deep Learning, Machine Learning, Software Metrics, Java Applications

Kritika

Independent Researcher, New Delhi, India

Abstract

Sustainable software development practices are essential for ensuring code quality, maintainability, and security. However, traditional approaches often overlook the presence of code smells and vulnerabilities, leading to technical debt and security risks. This paper presents a comprehensive analysis of code smells and vulnerabilities in Java applications using machine learning and deep learning techniques. The study curates’ datasets from 25 Java applications, utilizing tools like PMD, JDeodorant, IntelliJ Idea, and SciTools Understand to detect code smells and vulnerabilities, and compute software metrics. The experimental approach applies supervised machine learning algorithms and deep learning models, including Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), on the pre-processed datasets. The results demonstrate that the JRIP algorithm produces the best results for vulnerabilities like Law of Demeter (81.51% accuracy), Too Many Methods (97.09% accuracy), and Local Variable Could Be Final (88.07% accuracy). For the Beam Member Should Serialize vulnerability, the J48 algorithm achieves an accuracy of 96.2%. The PMD tool outperforms IntelliJ Idea in detecting code smells like God Class (>90% accuracy) and Long Method (>90% accuracy) in Java applications. Additionally, the study establishes a relationship between code smells and vulnerabilities, with algorithms like J48 and JRIP effectively identifying patterns across both. Regarding deep learning techniques, CNN achieves higher accuracy than RNN for code smells like God Class (90.08% vs. 86.78%) and Long Method (89.18% vs. 81.08%). However, for vulnerabilities, CNN excels in detecting Law of Demeter (96.77% accuracy) and Cyclomatic Complexity (92.64% accuracy), while RNN demonstrates better performance for Beam Member Should Serialize (88.4% accuracy) and Too Many Methods (94.28% accuracy).

How to Cite

[1]

Kritika, “A Deep Dive into Code Smell and Vulnerability Using Machine Learning and Deep Learning Techniques”, Int. J. Comput. Eng. Res. Trends, vol. 11, no. 4, pp. 32–45, Apr. 2024.

Issue

Vol. 11 No. 4 (2024): April (2024) Issue

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

IJCERT Policy:

The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.

By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.

References

Kritika (2023). Correlating Propensity Between Code Smells and Vulnerabilities in Java Applications. International Journal of Scientific Research in Computer Science and Engineering, 11(1), 23-28.

Sunita, S. (2021). An Overview of Deep Learning. International Journal of Engineering Research and Technology, 9(5).

Shetty, D. H., Varma, M. J., Navi, S., & Ahmed, M. R. (2020). Diving Deep into Deep Learning: History, Evolution, Types and Application. International Journal of Innovative Technology and Exploring Engineering, 9(3).

Indolia, S., & Goswami, A. K. (2018). Conceptual Understanding of Convolutional Neural Network -A Deep Learning Approach. International Conference on Computational Intelligence and Data Science, 679-688.

Albawi, S., & Zawi, S. A. (2017). Understanding of a Convolutional Neural Network. International Conference on Engineering and Technology, 1-6.

Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classification with multi-task learning. 25th International Joint Conference on Artificial Intelligence, arXiv-1605.

Fontana, F. A., Zanoni, M., Marino, A., & Mäntylä, M. V. (2013). Code smell detection: Towards a machine learning-based approach. IEEE international conference on software maintenance, 396-399.

Kovacevic, A., Slivka, J., & Vidakovic, D. (2022). Automatic Detection of Long Method and God Class code smells through neural source code embeddings. Expert Systems with Applications, 117607.

Tahir, A., Counsell, S., & MacDonell, S. G. (2016). An empirical study into the relationship between class features and test smells. 23rd Asia-Pacific Software Engineering Conference (APSEC), 137-144.

KS, V. K. (2019). A method for predicting software reliability using object-oriented design metrics. International Conference on Intelligent Computing and Control Systems (ICCS), 679-682.

Elia, I. A., Antunes, N., Laranjeiro, N., & Vieira, M. (2017). An analysis of openstack vulnerabilities. 13th European Dependable Computing Conference (EDCC), 129-134.

Kim, D. K. (2017). Finding bad code smells with neural network models. International Journal of Electrical and Computer Engineering, 7(6), 3613.

Pessoa, T., Monteiro, M. P., & Bryton, S. (2012). An eclipse plugin to support code smells detection. arXiv preprint arXiv:1204.6492.

Fokaefs, M., Tsantalis, N., Stroulia, E., & Chatzigeorgiou, A. (2011). Jdeodorant: identification and application of extract class refactorings. Proceedings of the 33rd International Conference on Software Engineering, 1037-1039.

Felix, S. J., & Vinod, V. (2018). A study on different tools for code smell detection. International Journal of Computer Science and Engineering, 6(7), 762-764.

Liu, H., Jin, J., Xu, Z., Bu, Y., Zou, Y., & Zhang, L. (2019). Deep Learning Based Code Smell Detection. IEEE Transactions on Software Engineering.

Medar, R., Rajpurohit, V. S., & Rashmi, B. (2017). Impact of training and testing data splits on accuracy of time series forecasting in machine learning. International Conference on Computing, Communication, Control and Automation (ICCUBEA), 1-6.

Kreimer, J. (2005). Adaptive detection of design flaws. Electronic Notes in Theoretical Computer Science, 141(4), 117-136.

Khomh, F., Penta, M. D., Guéhéneuc, Y.-G., & Antoniol, G. (2012). An exploratory study of the impact of antipatterns on class change- and fault-proneness. Empirical Software Engineering, 17(3), 243–275.

Khomh, F., Vaucher, S., Guhneuc, Y. G., & Sahraoui, H. (2009). A Bayesian Approach for the Detection of Code and Design Smells. Ninth International Conference on Quality Software, 305–314.

Khomh, F., Vaucher, S., Guhneuc, Y.-G., & Sahraoui, H. (2011). Bdtex: A gqm-based bayesian approach for the detection of antipatterns. Journal of Systems and Software, 84(4), 559-572.

Hassaine, S., Khomh, F., Gueheneuc, Y. G., & Hamel, S. (2010). IDS: An Immune-Inspired Approach for the Detection of Software Design Smells. Seventh International Conference on the Quality of Information and Communications Technology, 343–348.

Oliveto, R., Khomh, F., Antoniol, G., & Gueheneuc, Y. G. (2010). Numerical Signatures of Antipatterns: An Approach Based on B-Splines. 14th European Conference on Software Maintenance and Reengineering, 248–251.

Maiga, A., Ali, N., Bhattacharya, N., Saban, A., Guhneuc, Y. G., & Aimeur, E. (2012). SMURF: A SVMbased Incremental Anti-pattern Detection Approach. 19th Working Conference on Reverse Engineering, 466–475.

Maiga, A., Ali, N., Bhattacharya, N., Saban, A., Guhneuc, Y. G., Antoniol, G., & Ameur, E. (2012). Support vector machines for anti-pattern detection. Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, 278–281.

Palomba, F., Bavota, G., Penta, M. D., Oliveto, R., Lucia, A. D., & Poshyvanyk, D. (2013). Detecting bad smells in source code using change history information. 28th IEEE/ACM International Conference on Automated Software Engineering, 268–278.

Palomba, F., Bavota, G., Penta, M. D., Oliveto, R., Poshyvanyk, D., & Lucia, A. D. (2015). Mining Version Histories for Detecting Code Smells. IEEE Transactions on Software Engineering, 41(5), 462–489.

Fu, S., & Shen, B. (2015). Code Bad Smell Detection through Evolutionary Data Mining. ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 1–9.

Arcelli Fontana, F., Mäntylä, M. V., Zanoni, M., & Marino, A. (2016). Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering, 21(3), 1143–1191.

Hadj-Kacem, M., & Bouassida, N. (2018). A hybrid approach to detect code smells using deep learning. 13th International Conference on Evaluation of Novel Approaches to software engineering, 137-146.

Liu, H., Jin, J., Xu, Z., Bu, Y., Zou, Y., & Zhang, L. (2021). Deep learning based code smell detection. IEEE Transactions on Software Engineering, 47(9), 1811-1837.

Cao, S., et al. (2021). BGNN4VD: constructing bidirectional graph neural network for vulnerability detection. Information and Software Technology, 136, 106576.

Subhan, F., Wu, X., Bo, L., Sun, X., & Rahman, M. (2022). A deep learning‐based approach for software vulnerability detection using code metrics. IET Software, 16(5), 516-526.

Russell, R., Kim, L., Hamilton, L., Lazovich, T., Harer, J., Ozdemir, O., Ellingwood, P., & McConley, M. (2018). Automated vulnerability detection in source code using deep representation learning. 17th IEEE international conference on machine learning and applications (ICMLA), 757-762.

Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR), 9(1), 381-386.

Jiang, T., Gradus, J. L., & Rosellini, A. J. (2020). Supervised machine learning: a brief primer. Behavior Therapy, 51(5), 675-687.

Van Engelen, J. E., & Hoos, H. H. (2020). A survey on semi-supervised learning. Machine Learning, 109(2), 373-440.

Alloghani, M., Al-Jumeily, D., Mustafina, J., Hussain, A., & Aljaaf, A. J. (2020). A systematic review on supervised and unsupervised machine learning algorithms for data science. In Supervised and unsupervised learning for data science (pp. 3-21). Springer, Cham.

Moerland, T. M., Broekens, J., Plaat, A., & Jonker, C. M. (2023). Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1), 1-118.

Kirkby, R. (2002). WEKA Explorer User Guide for version 3-3-4. University of Waikato.

Paiva, T., Damasceno, A., Padilha, J., Figueiredo, E., & Sant'Anna, C. (2015). Experimental evaluation of code smell detection tools.

Kim, D. K. (2017). Finding bad code smells with neural network models. International Journal of Electrical and Computer Engineering, 7(6), 3613.

Kurbatova, Z., Golubev, Y., Kovalenko, V., & Bryksin, T. (2021, November). The IntelliJ platform: a framework for building plugins and mining software data. 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), 14-17.

Liu, H., Jin, J., Xu, Z., Bu, Y., Zou, Y., & Zhang, L. (2019). Deep Learning Based Code Smell Detection. IEEE Transactions on Software Engineering.

A Deep Dive into Code Smell and Vulnerability Using Machine Learning and Deep Learning Techniques

Abstract

References

QUICK LINKS

FOR AUTHORS

FOR REVIEWERS

JOURNAL CONTENTS

DOWNLOADS

Article Sidebar

Main Article Content

Abstract

Article Details

References