Detecting Phishing Websites Using Natural Language Processing
Main Article Content
Abstract
Phishing is one of the most cyber attacking tools. It targets both users and organizations. Several solutions have been proposed for detecting and preventing phishing websites, emails and SMSs. However, more research works are required to improve the phishing detection techniques such as improving the detection scalability and reducing false positive and negative alerts. This paper proposes a website phishing detection system based on natural language processing (NLP) features such as statements, words, and characters frequency. The proposed system first enables any user to find out if a specific website is phishing or not and, second, provides a search engine that 24/7 searches for the phishing websites and informs the system administrator (or publishes alerts online) about that. The system is evaluated in terms of its scalability and accuracy. The system accuracy here relies on the number of false-positive, false negative, true positive, and true negative alerts.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
Verma, R., Shashidhar, N., & Hossain, N. , “Detecting Phishing Emails the Natural Language Way”,Computer Security–ESORICS 2012, 824-841.
Patil, P.; Devale, P. “A literature survey of phishing attack technique”, Int. J. Adv. Res. Comput. Commun. Eng. 2016, 5, 198–200. 17.
Rakesh M. Verma and Nabil Hossain. “Semantic feature selection for text with application to phishing email detection”, InProc. 16th International Conference on Information Security and Cryptology ICISC, Revised Selected Papers, pages 455–468. Springer, 2013.
R. M. Mohammad, F. Thabtah, L. McCluskey, “Tutorial and critical analysis of phishing websites methods,”, Computer Science Review, vol. 17, pp. 1-24, 2015.
Kang-Leng Chiew , Kelvin S. C. Yong , Choon Lin Tan:” A survey of phishing attacks: Their types, vectors and technical approaches”, Expert Syst. Appl,106: 1-20
Rakesh Verma, Narasimha Shashidhar, and Nabil Hossain, “Detecting phishing emails the natural language way”. European Symposium on Research in Computer Security, pages 824–841. Springer, 2012.
J. Kang and D. Lee, “Advanced white list approach for preventing access to phishing sites,”, Proc. International Conference on Convergence Information Technology (ICCIT 2007), pp.491-496, 2007.
Y. Cao, W. Han, and Y. Le, ?”Anti-phishing based on automated individual white-list”, Proceedings of the 4th ACM workshop on Digital identity management. New York, NY, USA: ACM, 2008, pp. 51–60.
M. Sharifi and S. H. Siadati, “A phishing sites blacklist generator,” , IEEE/ACS International Conference on Computer Systems and Applications, pp. 840-843, 2008.
P. Prakash, M. Kumar, R. R. Kompella, and M. Gupta, “Phishnet: predictive blacklisting to detect phishing attacks,”, Proc. IEEE INFOCOM, 2010, pp.1-5, 2010.
Ardi C, Heidemann J , Auntietuna: “personalized content-based phishing detection”, NDSS usable security workshop (USEC). https://doi.org/10.14722/usec.2016.23012
Hongming Che, Qinyun Liu, Lin Zou, Hongji Yang, Dongdai Zhou, Feng Yu, “A Content-Based Phishing Email Detection Method”, QRS Companion 2017: 415-422
Peng, T., Harris, I. and Sawa, Y.,” Detecting phishing attacks using natural language processing and machine learning”, IEEE 12th International Conference on Semantic Computing (ICSC) (pp. 300-301), 2018.
Egozi, G. and Verma, R., “Phishing Email Detection Using Robust NLP Techniques”, IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 7-12), November 2018.
L. Wenyin, G. Huang, L. Xiao Yue, Z. Min, X. Deng, “Detection of phishing webpages based on visual similarity,”, Special interest tracks and posters of the 14th International Conference on World Wide Web, pp. 1060-1061, 2005.
Y. Fu, L. Wenyin and X. Deng, "Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD)," , IEEE Transactions on Dependable and Secure Computing, vol. 3, no. 4, pp. 301-311, 2006.