A Survey on various Stemming Algorithms

Sundar Singh, R K Pateriya, , ,
Computer Science & Engineering Department Maulana Azad National Institute of Technology Bhopal, India, 462003

Stemming is a technique used to reduce words to their root form called stem, by removing derivational and inflectional affixes. Most of the existing stemming algorithms uses affix stripping technique. This technique has wide application in NLP, Text mining and information retrieval. Stemming improves the performance of information retrieval systems by decreasing the index size. There are many stemming algorithms implemented for English language. Many of these algorithms are working successfully in information retrieval system. However there are many drawbacks in stemming algorithms, since these algorithms can’t fully describe English morphology. In this paper different stemming algorithms are discussed and compared in terms of usefulness and there limitations.

Keywords : Stemming, stop word, recall, precision, Text mining, NLP, IR.

[1] Porter M.F. “An algorithm for suffix stripping” Program. 1980; 14, 130- 
[2] Porter M.F. “Snowball: A language for stemming algorithms”. 2001 
[3] Eiman Tamah Al-Shammari “Towards An Error-Free Stemming”, in Proceedings of ADIS European Conference Data Mining 2008, pp. 160-163. 
[4] Frakes W.B. “Term conflation for information retrieval”. Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval. 1984, 383-389. 
[5] Frakes William B. “Strength and similarity of affix removal stemming algorithms”. ACM SIGIR Forum, Volume 37, No. 1. 2003, 26-30. 
[6] M. Nithya, “Clustering Technique with Porter stemmer and Hyper graph Algorithms for Multi-featured Query Processing”, International Journal of Modern Engineering Research (IJMER), Vol.2, Issue.3, pp960-965, May-June 2012 
[7] Galvez Carmen and Moya-Aneg•n F˜lix. “An Evaluation of conflation accuracy using finite-state transducers”. Journal of Documentation 62(3). 2006, 328-349 
[8] J. B. Lovins, “Development of a stemming algorithm,” Mechanical Translation and Computer Linguistic., vol.11, no.1/2, pp. 22-31, 1968. 
[9] Harman Donna. “How effective is suffixing?” Journal of the American Society for Information Science. 1991; 42, 7-15 7. 
[10] Kjetil, Randi, “News Item Extraction for Text Mining in Web Newspapers” WIRI’05, IEEE, 2009 
[11] Kraaij Wessel and Pohlmann Renee. “Viewing stemming as recall enhancement”. Proceedings of the 19thannual international ACM SIGIR conference on Research and development in information retrieval. 1996, 40-48. 
[12] Krovetz Robert. “Viewing morphology as an inference process”. Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. 1993, 191-202. 
[13] Mayfield James and McNamee Paul. “Single N-gram stemming”. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval. 2003, 415- 416. 
[14] Melucci Massimo and Orio Nicola. “A novel method for stemmer generation based on hidden Markov models”. Proceedings of the twelfth international conference on Information and knowledge management. 2003, 131-138. 
[15] Mladenic Dunja. “Automatic word lemmatization”. Proceedings B of the 5th International Multi-Conference Information Society IS. 2002, 153-159. [14] Paice Chris D. “Another stemmer”. ACM SIGIR Forum, Volume 24, No. 3. 1990, 56-61. 
[16] Paice Chris D. “An evaluation method for stemming algorithms”. Proceedings of the 17th annual international ACM SIGIR conferenceon Research and development in information retrieval. 1994, pp. 42-50.
 [17] Plisson Joel, Lavrac Nada and Mladenic Dunja. “A rule based approach to word lemmatization”. Proceedings C of the 7th International Multi-Conference Information Society IS. 2004 
[18] Prasenjit Majumder, Mandar Mitra, Swapan K. Parui, Gobinda Kole, Pabitra Mitra and Kalyankumar Datta. “YASS: Yet another suffix stripper”. ACM Transactions on Information Systems. Volume 25, Issue 4. 2007, Article No. 18.


