A Hybrid Cloud-Based Predictive Analytics Framework: Balancing Scalability, Cost Efficiency, and Data Security in Big Data Processing

Main Article Content

Mettu Yashwanth
Mohamed Ghouse Shukur
Dileep M R

Abstract

The exponential growth of big data presents substantial challenges for organizations that need to process and analyze amounts of real-time and batch data efficiently, while adhering to stringent data security and regulatory requirements. Traditional on-premises infrastructures, though secure, often lack the scalability and flexibility needed to manage such high-volume data, whereas fully cloud-based solutions raise concerns about data privacy and compliance. To address these issues, this paper proposes a novel hybrid cloud-based big-data framework designed specifically for predictive analytics. The framework integrates scalability, elasticity, and cost-efficiency of cloud platforms with  security and control provided by the on-premises infrastructure. By dynamically partitioning workloads based on data sensitivity and processing requirements, the system ensures optimal resource allocation and performance across diverse data-processing tasks. The proposed framework was evaluated across several key performance metrics, demonstrating its ability to handle both real-time streaming data and batch data processing effectively. The experimental results indicate that the system achieves high scalability, processing 8,000 data units per second while maintaining a low latency of 30 ms for real-time analytics. In terms of cost efficiency, the framework significantly reduces expenses, with a cost of $200 per terabyte of processed data compared with traditional solutions. Furthermore, the framework enhances predictive accuracy, with a mean squared error (MSE) of 0.03, outperforming both on-premises and fully cloud-based systems. The flexibility of the architecture allows for the secure processing of sensitive data on-premises to meet regulatory compliance (e.g., GDPR, HIPAA), whereas non-sensitive data are processed in the cloud, leveraging the cloud’s elastic computational resources. This framework addresses the key limitations of the existing data infrastructure, providing a balanced solution that optimizes performance, security, and cost. However, challenges, such as the overhead introduced by data transfers between the cloud and on-premises systems, as well as the complexity of managing a hybrid environment, are acknowledged. Future research will focus on minimizing these challenges through enhanced data synchronization methods and intelligent workload orchestration. Overall, this study contributes to the growing field of hybrid cloud architectures for big data analytics, offering a scalable and secure solution that meets the demands of modern data-driven organizations.

Article Details

How to Cite
[1]
Mettu Yashwanth, Mohamed Ghouse Shukur, and Dileep M R, “A Hybrid Cloud-Based Predictive Analytics Framework: Balancing Scalability, Cost Efficiency, and Data Security in Big Data Processing”, Int. J. Comput. Eng. Res. Trends, vol. 11, no. 6, pp. 12–21, Jun. 2024.
Section
Research Articles

References

] M. Armbrust et al., "A view of cloud computing," Communications of the ACM, vol. 53, no. 4, pp. 50-58, 2010.

] S. H. Kaisler, F. Armour, J. A. Espinosa, and W. Money, "Big data: Issues and challenges moving forward," in 46th Hawaii International Conference on System Sciences, 2013, pp. 995-1004.

] P. Mell and T. Grance, "The NIST definition of cloud computing," National Institute of Standards and Technology, 2011.

] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, "Internet of Things (IoT): A vision, architectural elements, and future directions," Future Generation Computer Systems, vol. 29, no. 7, pp. 1645-1660, 2013.

] D. Catteddu and G. Hogben, "Cloud computing: Benefits, risks and recommendations for information security," European Network and Information Security Agency (ENISA), 2009.

] C. Sterling, The HIPAA program reference handbook. Wiley, 2008.

] I. Foster et al., "Cloud computing and grid computing 360-degree compared," in Grid Computing Environments Workshop, 2008, pp. 1-10.

] A. Marinos and G. Briscoe, "Community cloud computing," in 1st International Conference on Cloud Computing, 2009, pp. 472-484.

] M. D. Dikaiakos, D. Katsaros, P. Mehra, G. Pallis, and A. Vakali, "Cloud computing: Distributed internet computing for IT and scientific research," IEEE Internet Computing, vol. 13, no. 5, pp. 10-13, 2009.

] R. Buyya, C. S. Yeo, and S. Venugopal, "Market-oriented cloud computing: Vision, hype, and reality for delivering IT services as computing utilities," in 10th IEEE International Conference on High Performance Computing and Communications, 2008, pp. 5-13.

] J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.

] T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.

] S. G. Dovgan and D. K. Irwin, "Balancing cost, security, and performance in hybrid cloud environments," in IEEE/ACM 4th International Symposium on Edge Computing (SEC), 2019, pp. 98-105.

] J. E. Smith and R. Nair, Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufmann, 2005.

] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "A view of cloud computing," Communications of the ACM, vol. 53, no. 4, pp. 50-58, Apr. 2010.