SMERAS - State Management with Efficient Resource Allocation and Scheduling in Big Data Stream Processing Systems
Main Article Content
Abstract
Recent advances in big data and distributed computing systems having prominent data management with low cost of storage and efficient resource scheduling strategies leading reliable and scalable system designs. It helps for developing better decision-making systems in the era of Big Data. Speed of data arrival rate demands the speed of data processing rate. Existing scenarios uses complex query execution engines in distributed manner to process real-time and near real-time streaming data. Data Stream processing systems facing challenges with respect to resource management and are looking for the efficient resource scheduling and query execution strategies. In this paper, the SMERAS model is proposed and it uses a state full stream management based on a pipeline with various scheduling queues for managing streams of data. Experimental results show the performance analysis of the proposed system compared with the existing systems.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
Tian, L., & Chandy, K. M. (2006, September). Resource allocation in streaming environments. In 2006 7th IEEE/ACM International Conference on Grid Computing (pp. 270-277). IEEE.
To, Q. C., Soto, J., & Markl, V. (2018). A survey of state management in big data processing systems. The VLDB Journal, 27(6), 847-872.
Asyabi, S. E. Toward Workload-Aware State Management in Streaming Systems.
Langhi, S., Tommasini, R., & Della Valle, E. (2020, December). Extending kafka streams for complex event recognition. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 2190-2197). IEEE.
Fragkoulis, M., Carbone, P., Kalavri, V., & Katsifodimos, A. (2020). A survey on the evolution of stream processing systems. arXiv preprint arXiv:2008.00842.
Li, Z., Yu, J., Bian, C., Pu, Y., Wang, Y., Zhang, Y., & Guo, B. (2020). Flink-er: an elastic resource-scheduling strategy for processing fluctuating mobile stream data on flink. Mobile Information Systems, 2020, 1-17.
de Souza, P. R., Matteussi, K. J., dos Anjos, J. C., Dos Santos, J. D., Geyer, C. F. R., & da Silva Veith, A. (2018, July). Aten: A dispatcher for big data applications in heterogeneous systems. In 2018 International Conference on High Performance Computing & Simulation (HPCS) (pp. 585-592). IEEE.
Jiang, Y., Huang, Z., & Tsang, D. H. (2016). Towards max-min fair resource allocation for stream big data analytics in shared clouds. IEEE Transactions on Big Data, 4(1), 130-137.
Kiruthiga, R., & Akila, D. (2021, January). Heterogeneous fair resource allocation and scheduling for big data streams in cloud environments. In 2021 2nd International Conference on Computation, Automation and Knowledge Management (ICCAKM) (pp. 128-132). IEEE.
Ahmad, W., Alam, B., & Atman, A. (2021). An energy-efficient big data workflow scheduling algorithm under budget constraints for heterogeneous cloud environment. The Journal of Supercomputing, 77, 11946-11985.
Tang, Z., Du, L., Zhang, X., Yang, L., & Li, K. (2021). AEML: An acceleration engine for multi-GPU load-balancing in distributed heterogeneous environment. IEEE Transactions on Computers, 71(6), 1344-1357.
Ni, X., Li, J., Yu, M., Zhou, W., & Wu, K. L. (2020, April). Generalizable resource allocation in stream processing via deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 01, pp. 857-864).
KhudaBukhsh, W. R., Kar, S., Alt, B., Rizk, A., & Koeppl, H. (2020). Generalized cost-based job scheduling in very large heterogeneous cluster systems. IEEE Transactions on Parallel and Distributed Systems, 31(11), 2594-2604.
De Matteis, T., & Mencagli, G. (2017). Parallel patterns for window-based stateful operators on data streams: an algorithmic skeleton approach. International Journal of Parallel Programming, 45(2), 382-401.
Kalavri, V., & Liagouris, J. (2020, July). In support of workload-aware streaming state management. In Proceedings of the 12th USENIX Conference on Hot Topics in Storage and File Systems (pp. 19-19).
Zhao, H., Zheng, Q., Zhang, W., & Wang, J. (2016). Prediction-based and locality-aware task scheduling for parallelizing video transcoding over heterogeneous mapreduce cluster. IEEE Transactions on Circuits and Systems for Video Technology, 28(4), 1009-1020.
Carbone, P., Fragkoulis, M., Kalavri, V., & Katsifodimos, A. (2020, June). Beyond analytics: The evolution of stream processing systems. In Proceedings of the 2020 ACM SIGMOD international conference on Management of data (pp. 2651-2658).
Carbone, P., Katsifodimos, A., & Haridi, S. (2019). Stream Window Aggregation Semantics and Optimization.
Janßen, G., Verbitskiy, I., Renner, T., & Thamsen, L. (2018, December). Scheduling stream processing tasks on geo-distributed heterogeneous resources. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 5159-5164). IEEE.
Akil, B., Zhou, Y., & Röhm, U. (2017, December). On the usability of Hadoop MapReduce, Apache Spark & Apache flink for data science. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 303-310). IEEE.
Tang, S., He, B., Liu, H., & Lee, B. S. (2016). 9 Resource Management in Big Data Processing Systems. Big Data Principles and Paradigm.
Stein, O., Blamey, B., Karlsson, J., Sabirsh, A., Spjuth, O., Hellander, A., & Toor, S. (2020, December). Smart Resource Management for Data Streaming using an Online Bin-packing Strategy. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 2207-2216). IEEE.