A Scalable Real-Time Event Prediction System for Distributed Networks Using Online Random Forest and CluStream

R.Anil Kular; A Malla Reddy; K Samunnisa

doi:10.22362/ijcert/2024/v11/i6/v11i605

PDF

Published: Jun 30, 2024

DOI: https://doi.org/10.22362/ijcert/2024/v11/i6/v11i605

Keywords:

Real-time event prediction, data stream mining, distributed networks, Online Random Forest, CluStream, incremental learning, adaptive sliding window, scalability, clustering efficiency.

R.Anil Kular

Associate professor, Department of Computer Science and Engineering, Ashoka Women’s Engineering College, Kurnool, Andhra Pradesh,India.

A Malla Reddy

Professor, Department of Information Technology, CVR College of Engineering, Hyderabad, Telangana. India.

K Samunnisa

Assistant professor, Department of Computer Science and Engineering, Ashoka Women’s Engineering College, Kurnool, Andhra Pradesh, India.

Abstract

This paper presents a robust architecture designed for real-time event prediction in distributed networks, utilizing Online Random Forest (ORF) and CluStream for incremental learning and dynamic clustering. The system addresses challenges posed by high-velocity, large-scale data streams, incorporating adaptive sliding windows and real-time data processing to ensure scalability, low latency, and accuracy. Comparative analysis against traditional models, including Naive Bayes and Support Vector Machines, reveals that the proposed system achieves superior predictive accuracy (91.5%), precision (92%), and recall (88%) while maintaining an F1 score of 90%. Clustering efficiency is significantly improved through CluStream, which dynamically manages evolving data streams with lower clustering time compared to conventional methods like K-Means. However, as data stream size increases, latency grows from 120ms for small streams (10MB) to 850ms for large streams (1000MB), indicating a need for further optimization at extreme scales. The system is suitable for applications in network security, IoT monitoring, and large-scale real-time analytics. Despite its strengths, limitations include resource consumption and challenges in managing highly volatile or unstructured data. Future enhancements may focus on reducing latency for larger data streams and improving adaptability to extreme concept drift. This research demonstrates a scalable, efficient, and adaptive approach to real-time event prediction in distributed environments.

How to Cite

[1]

R.Anil Kular, A Malla Reddy, and K Samunnisa, “A Scalable Real-Time Event Prediction System for Distributed Networks Using Online Random Forest and CluStream”, Int. J. Comput. Eng. Res. Trends, vol. 11, no. 6, pp. 43–56, Jun. 2024.

Issue

Vol. 11 No. 6 (2024): June (2024) Issue

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

IJCERT Policy:

The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.

By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.

References

] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, "Models and issues in data stream systems," in Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2002, pp. 1-16.

] G. Cormode and M. Garofalakis, "Sketching streams through the net: Distributed approximate query tracking," in Proceedings of the 31st International Conference on Very Large Data Bases, 2005, pp. 13-24.

] M. Datar, A. Gionis, P. Indyk, and R. Motwani, "Maintaining stream statistics over sliding windows," SIAM Journal on Computing, vol. 31, no. 6, pp. 1794-1813, 2002.

] A. Bifet and R. Gavalda, "Learning from time-changing data with adaptive windowing," in Proceedings of the 2007 SIAM International Conference on Data Mining, 2007, pp. 443-448.

] C. Aggarwal, J. Han, J. Wang, and P. S. Yu, "A framework for clustering evolving data streams," in Proceedings of the 29th International Conference on Very Large Data Bases, 2003, pp. 81-92.

] S. Shalev-Shwartz, "Online learning and online convex optimization," Foundations and Trends in Machine Learning, vol. 4, no. 2, pp. 107-194, 2012.

] A. Saffari, C. Leistner, J. Santner, M. Godec, and H. Bischof, "Online random forests," in Proceedings of the 2009 IEEE 12th International Conference on Computer Vision Workshops, 2009, pp. 1393-1400.

] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.

] P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, "Long short term memory networks for anomaly detection in time series," in Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2015.

] M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy, "Mining data streams: A review," ACM SIGMOD Record, vol. 34, no. 2, pp. 18-26, 2005.

] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, "A survey on concept drift adaptation," ACM Computing Surveys, vol. 46, no. 4, pp. 1-37, 2014.

] A. Bifet, G. Holmes, B. Pfahringer, and R. Kirkby, "MOA: Massive online analysis," Journal of Machine Learning Research, vol. 11, pp. 1601-1604, 2010.

] R. Klinkenberg, "Learning drifting concepts: Example selection vs. example weighting," Intelligent Data Analysis, vol. 8, no. 3, pp. 281-300, 2004.

] Apache Kafka, "Apache Kafka Documentation." [Online]. Available: https://kafka.apache.org/documentation. [Accessed: Sep. 09, 2024].

] Apache Flink, "Apache Flink Documentation." [Online]. Available: https://flink.apache.org. [Accessed: Sep. 09, 2024].

] C. Carbone, A. Katsifodimos, S. Haridi, and V. Markl, "Apache Flink: Stream and batch processing in a single engine," Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 38, no. 4, pp. 28-38, 2015.

] P. P. C. Lee, T. Bu, and T. Woo, "Monitoring high-speed data streams," Journal of Parallel and Distributed Computing, vol. 71, no. 2, pp. 277-287, 2011.

] G. Cormode and S. Muthukrishnan, "What's new: Finding significant differences in network data streams," IEEE/ACM Transactions on Networking, vol. 13, no. 6, pp. 1219-1232, 2005.

] Y. Zhu and D. Shasha, "StatStream: Statistical monitoring of thousands of data streams in real-time," in Proceedings of the 28th International Conference on Very Large Data Bases, 2002, pp. 358-369.

] G. Widmer and M. Kubat, "Learning in the presence of concept drift and hidden contexts," Machine Learning, vol. 23, no. 1, pp. 69-101, 1996.

] C. C. Aggarwal, "Data streams: Models and algorithms," in Advances in Database Systems, vol. 31, New York: Springer, 2007.

] P. Domingos and G. Hulten, "Mining high-speed data streams," in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000, pp. 71-80.

] J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.

] G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.

] M. Krawczyk, B. M. Krawczyk, and J. Stefanowski, "Data stream analysis: The learning process in non-stationary environments," IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 3, pp. 533-551, 2018.

A Scalable Real-Time Event Prediction System for Distributed Networks Using Online Random Forest and CluStream

Abstract

References

Most read articles by the same author(s)

QUICK LINKS

FOR AUTHORS

FOR REVIEWERS

JOURNAL CONTENTS

DOWNLOADS

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)