Classification of Concept Drifting Data Streams Using Adaptive Novel-Class Detection

Main Article Content

Ms. Aparna Yeshwantrao Ladekar
Dr. Mrs. M.Y. Joshi

Abstract

In data stream classification there are many problems observed by the data mining community. Four major problems are addressed, such as, concept-drift, infinite length, feature-evolution and concept-evolution. Concept-drift occurs when underlying concept changes which is common in data streams. Practically it is not possible to store and use all data for training purpose whenever required due to infinite length of data streams. Feature evolution frequently occurs in many text streams. In text streams new features like words or phrases may occur when stream progresses. New classes evolving in the data stream which occurs concept-evolution as a result. Most existing classification techniques of data stream consider only the first two challenges, and ignore the latter two. Classification of concept-drifting data stream using adaptive novelclass detection approach is used to solve concept-drift and concept-evolution problem where novel-class detector is maintained with classifier. Novel-class detector is more adaptive to the dynamic and evolving data streams. It enables to detect more than one novel-class simultaneously. This approach solves feature-evolution problem by using feature set homogenization technique. Experiments done on Twitter data set and got reduced ERR rate and increased detection rate as a result. This approach is very effective as compared with existing data stream classification techniques.

Article Details

How to Cite
[1]
Ms. Aparna Yeshwantrao Ladekar and Dr. Mrs. M.Y. Joshi, “Classification of Concept Drifting Data Streams Using Adaptive Novel-Class Detection”, Int. J. Comput. Eng. Res. Trends, vol. 3, no. 9, pp. 514–520, Sep. 2016.
Section
Research Articles

References

Mohammad M. Masud, Member, IEEE, Qing Chen, Member, IEEE, Latifur Khan, Senior Member, IEEE, Charu C. Aggarwal, Fellow, IEEE, Jing Gao, Member, IEEE, Jiawei Han, Fellow, IEEE, Ashok Srivastava, Senior Member, IEEE, and Nikunj C. Oza, Member, IEEE,” Classification and Adaptive Novel-class Detection of Feature-Evolving Data Streams,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 7, July 2013.

M.M. Masud, Q. Chen, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel-class Detection of Data Streams in a Dynamic Feature Space,” Proc. European Conf.Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 337-352, 2010.

M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Integrating Novel-class Detection with Classification for ConceptDrifting Data Streams,” Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 79-94, 2009.

A. Bifet and R. Kirkby. Data stream mining − a practical approach. http://moa.cs.waikato.ac.nz/downloads/.

M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M. Thuraisingham, “Addressing Concept-Evolution in ConceptDrifting Data Streams,” Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.

G. Hulten, L. Spencer, and P. Domingos, “Mining Time-Changing Data Streams,” Proc. ACM SIGKDD Seventh Int’l Conf. Knowledge Discovery and Data Mining, pp. 97-106, 2001.

Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütz, “Introduction to Information Retrieval,” e, 2008.

“Stemming”, http://en.wikipedia.org/wiki/Stemming.

M.F.Porter, “An algorithm for suffix stripping,” Computer Laboratory, Cambridge.

E.J.Spinosa, A.P. de Leon F. de Carvalho, and J. Gama, “ClusterBased Novel Concept Detection in Data Streams Applied to Intrusion Detection in Computer Networks,”Proc. ACM Symp. Applied Computing (SAC), pp. 976-980, 2008.

I. Katakis, G. Tsoumakas, and I. Vlahavas, “Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams, ” Proc. IntlWorkshop Knowledge Discovery from Data Streams (ECML/PKDD), pp. 102-116, 2006.

M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel-class Detection in Concept-Drifting Data Streams under Time Constraints,” IEEE Trans. Knowledge and Data Eng., vol. 23, no. 6, pp. 859-874, June 2011.

B.Wenerstrom and C.Giraud-Carrier, “Temporal Data Mining in Dynamic Feature Spaces,” Proc. Sixth Int’l Conf. Data Mining (ICDM), pp. 1141-1145, 2006.

W. Fan, “Systematic Data Selection to Mine Concept-Drifting Data Streams,” Proc. ACM SIGKDD 10th Int’l Conf. Knowledge Discovery and Data Mining, pp. 128-137, 2004.