Classification of Concept Drifting Data Streams Using Adaptive Novel-Class Detection
Main Article Content
Abstract
In data stream classification there are many problems observed by the data mining community. Four major problems are addressed, such as, concept-drift, infinite length, feature-evolution and concept-evolution. Concept-drift occurs when underlying concept changes which is common in data streams. Practically it is not possible to store and use all data for training purpose whenever required due to infinite length of data streams. Feature evolution frequently occurs in many text streams. In text streams new features like words or phrases may occur when stream progresses. New classes evolving in the data stream which occurs concept-evolution as a result. Most existing classification techniques of data stream consider only the first two challenges, and ignore the latter two. Classification of concept-drifting data stream using adaptive novelclass detection approach is used to solve concept-drift and concept-evolution problem where novel-class detector is maintained with classifier. Novel-class detector is more adaptive to the dynamic and evolving data streams. It enables to detect more than one novel-class simultaneously. This approach solves feature-evolution problem by using feature set homogenization technique. Experiments done on Twitter data set and got reduced ERR rate and increased detection rate as a result. This approach is very effective as compared with existing data stream classification techniques.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
Mohammad M. Masud, Member, IEEE, Qing Chen, Member, IEEE, Latifur Khan, Senior Member, IEEE, Charu C. Aggarwal, Fellow, IEEE, Jing Gao, Member, IEEE, Jiawei Han, Fellow, IEEE, Ashok Srivastava, Senior Member, IEEE, and Nikunj C. Oza, Member, IEEE,” Classification and Adaptive Novel-class Detection of Feature-Evolving Data Streams,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 7, July 2013.
M.M. Masud, Q. Chen, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel-class Detection of Data Streams in a Dynamic Feature Space,” Proc. European Conf.Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 337-352, 2010.
M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Integrating Novel-class Detection with Classification for ConceptDrifting Data Streams,” Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 79-94, 2009.
A. Bifet and R. Kirkby. Data stream mining − a practical approach. http://moa.cs.waikato.ac.nz/downloads/.
M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M. Thuraisingham, “Addressing Concept-Evolution in ConceptDrifting Data Streams,” Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.
G. Hulten, L. Spencer, and P. Domingos, “Mining Time-Changing Data Streams,” Proc. ACM SIGKDD Seventh Int’l Conf. Knowledge Discovery and Data Mining, pp. 97-106, 2001.
Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütz, “Introduction to Information Retrieval,” e, 2008.
“Stemming”, http://en.wikipedia.org/wiki/Stemming.
M.F.Porter, “An algorithm for suffix stripping,” Computer Laboratory, Cambridge.
E.J.Spinosa, A.P. de Leon F. de Carvalho, and J. Gama, “ClusterBased Novel Concept Detection in Data Streams Applied to Intrusion Detection in Computer Networks,”Proc. ACM Symp. Applied Computing (SAC), pp. 976-980, 2008.
I. Katakis, G. Tsoumakas, and I. Vlahavas, “Dynamic Feature Space and Incremental Feature Selection for the Classification of Textual Data Streams, ” Proc. IntlWorkshop Knowledge Discovery from Data Streams (ECML/PKDD), pp. 102-116, 2006.
M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel-class Detection in Concept-Drifting Data Streams under Time Constraints,” IEEE Trans. Knowledge and Data Eng., vol. 23, no. 6, pp. 859-874, June 2011.
B.Wenerstrom and C.Giraud-Carrier, “Temporal Data Mining in Dynamic Feature Spaces,” Proc. Sixth Int’l Conf. Data Mining (ICDM), pp. 1141-1145, 2006.
W. Fan, “Systematic Data Selection to Mine Concept-Drifting Data Streams,” Proc. ACM SIGKDD 10th Int’l Conf. Knowledge Discovery and Data Mining, pp. 128-137, 2004.