A Survey on various Stemming Algorithms

Sundar Singh; R K Pateriya

PDF

Published: May 31, 2015

Keywords:

Stemming, stop word, recall, precision, Text mining, NLP, IR.

Sundar Singh

R K Pateriya

Abstract

Stemming is a technique used to reduce words to their root form called stem, by removing derivational and inflectional affixes. Most of the existing stemming algorithms uses affix stripping technique. This technique has wide application in NLP, Text mining and information retrieval. Stemming improves the performance of information retrieval systems by decreasing the index size. There are many stemming algorithms implemented for English language. Many of these algorithms are working successfully in information retrieval system. However there are many drawbacks in stemming algorithms, since these algorithms can’t fully describe English morphology. In this paper different stemming algorithms are discussed and compared in terms of usefulness and there limitations.

How to Cite

[1]

Sundar Singh and R K Pateriya, “A Survey on various Stemming Algorithms”, Int. J. Comput. Eng. Res. Trends, vol. 2, no. 5, pp. 310–315, May 2015.

Issue

Vol. 2 No. 5 (2015): May (2015) Issue

Section

Survey

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

IJCERT Policy:

The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.

By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.

References

Porter M.F. “An algorithm for suffix stripping” Program. 1980; 14, 130-

Porter M.F. “Snowball: A language for stemming algorithms”. 2001

Eiman Tamah Al-Shammari “Towards An Error-Free Stemming”, in Proceedings of ADIS European Conference Data Mining 2008, pp. 160-163.

Frakes W.B. “Term conflation for information retrieval”. Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval. 1984, 383-389.

Frakes William B. “Strength and similarity of affix removal stemming algorithms”. ACM SIGIR Forum, Volume 37, No. 1. 2003, 26-30.

M. Nithya, “Clustering Technique with Porter stemmer and Hyper graph Algorithms for Multi-featured Query Processing”, International Journal of Modern Engineering Research (IJMER), Vol.2, Issue.3, pp960-965, May-June 2012

Galvez Carmen and Moya-Aneg•n F˜lix. “An Evaluation of conflation accuracy using finite-state transducers”. Journal of Documentation 62(3). 2006, 328-349

J. B. Lovins, “Development of a stemming algorithm,” Mechanical Translation and Computer Linguistic., vol.11, no.1/2, pp. 22-31, 1968.

Harman Donna. “How effective is suffixing?” Journal of the American Society for Information Science. 1991; 42, 7-15 7.

Kjetil, Randi, “News Item Extraction for Text Mining in Web Newspapers” WIRI’05, IEEE, 2009

Kraaij Wessel and Pohlmann Renee. “Viewing stemming as recall enhancement”. Proceedings of the 19thannual international ACM SIGIR conference on Research and development in information retrieval. 1996, 40-48.

Krovetz Robert. “Viewing morphology as an inference process”. Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. 1993, 191-202.

Mayfield James and McNamee Paul. “Single N-gram stemming”. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval. 2003, 415- 416.

Melucci Massimo and Orio Nicola. “A novel method for stemmer generation based on hidden Markov models”. Proceedings of the twelfth international conference on Information and knowledge management. 2003, 131-138.

Mladenic Dunja. “Automatic word lemmatization”. Proceedings B of the 5th International Multi-Conference Information Society IS. 2002, 153-159. [14] Paice Chris D. “Another stemmer”. ACM SIGIR Forum, Volume 24, No. 3. 1990, 56-61.

Paice Chris D. “An evaluation method for stemming algorithms”. Proceedings of the 17th annual international ACM SIGIR conferenceon Research and development in information retrieval. 1994, pp. 42-50.

Plisson Joel, Lavrac Nada and Mladenic Dunja. “A rule based approach to word lemmatization”. Proceedings C of the 7th International Multi-Conference Information Society IS. 2004

Prasenjit Majumder, Mandar Mitra, Swapan K. Parui, Gobinda Kole, Pabitra Mitra and Kalyankumar Datta. “YASS: Yet another suffix stripper”. ACM Transactions on Information Systems. Volume 25, Issue 4. 2007, Article No. 18.

Hull David A. and Grefenstette Gregory. “A detailed analysis of English stemming algorithms”. Rank Xerox Research Center Technical Report. 1996.

Xu Jinxi and Croft Bruce W. “Corpus-based stemming using cooccurrence of word variants”.ACM Transactions on Information Systems. Volume 16, Issue 1. 1998, 61-81.

Funchun Peng, Nawaaz Ahmed, Xin Li and Yumao Lu. “Context sensitive stemming for web search”. Proceedings of the 30th annual international ACMSIGIR conference on Research and development in information retrieval. 2007, 639-646.

R. Sun, C.-H. Ong, and T.-S. Chua. “Mining Dependency Relations for Query Expansion in Passage Retrieval”. In SIGIR, 2006.

Toman Michal, Tesar Roman and Jezek Karel. “Influence of word normalization on text classification”. The 1st International Conference on Multidisciplinary Information Sciences & Technologies. 2006, 354- 358

A Survey on various Stemming Algorithms

Abstract

References

QUICK LINKS

FOR AUTHORS

FOR REVIEWERS

JOURNAL CONTENTS

DOWNLOADS

Article Sidebar

Main Article Content

Abstract

Article Details

References