Accurate Analytics Assurance Using an Apache Spark on Hadoop Yarn Model for Emerging Big Data Systems

Mallikarjuna Reddy Beram
Lead I, UST Global, Bengaluru, Karnataka 560066, India,

Time and Tendency have made Information Technology to be the market trend, we call Automation, a need each and everywhere, and trending to Data as the important raw material for today’s world we call Big Data. Hence, In this white paper, the energy and the enthusiasm for the time being given stress on the data used for the energetics decision making, where the entire world moves on. Taking the opportunistic advantage of the Big Data environment, where testing is the biggest challenge for the entire Hadoop or spark or any other framework used to analyze the data to give a realistic picture to the end user, where the decision plays into existence. In this, I have given the functional and non-functional deterministic goal-driven approach to make the Data scientist and data engineer model data. Based on the Modelling, the test condition should be written in the map-reduce to know whether the node and function working as expected. The next test has a driven approach to get the optimization and performance like steaming data where spark plays the important role would get the good recommendation. Hence, Big data testing involves the next journey for the optimization, performance, and load balance along with the functional aspect of the data-driven by the data scientist needs to be a parallel process as the end functional is always deterministic to the extent of the end user.

Keywords : Automation, Hadoop Framework, Spark, Map Reduce Data Scientist, Data Engineer, Optimization, Performance, Parallel Process.

[1] Chen, M., Mao, S., & Liu, Y, “Big data: A survey”, Mobile Networks and Applications Springer, volume 19, issue 2, April2014, pp. 171-209.
[2] Sagiroglu, S., & Sinanc, D, “Big data: A review”, IEEE International Conference on Collaboration Technologies and Systems (CTS), 2013, pp 42-47.
[3] Pal, A., & Agrawal, S “An experimental approach towards big data for analyzing memory utilization on a Hadoop cluster using HDFS and MapReduce”, IEEE, First International Conference on Networks & Soft Computing (ICNSC), August 2014, pp.442-447.
[4] Zhang, J., & Huang, M. L., “5Ws model for bigdata analysis and visualization,” IEEE 16th International Conference on Computational Science and Engineering, 2013, pp.1021-1028.
[5] Qureshi, S. R., & Gupta, A, “Towards efficient Big Data and data analytics: A review”, IEEE International Conference on IT in Business, Industry and Government (CSIBIG),March 2014 pp-1-6.
[6] Aravinth, M. S., Shanmugapriyaa, M. S., Sowmya, M. S., & Arun, “An Efficient HADOOP Frameworks SQOOP and Ambari for Big
Data Processing,” International Journal for Innovative Research in Science and Technology, 2015, pp. 252-255.
[7] Cloudera-
[9] Tang, Z., Jiang, L., Zhou, J., Li, K., & Li, K, “A self-adaptive scheduling algorithm for reduce start time” Future Generation Computer Systems, Elsevier, 2015, pp:51-60.
[10] Zheng, Z., Zhu, J., & Lyu, M. R, “ervice-generated big data and big data-as-a-service: an overview,” IEEE International Congress on Big Data (BigData Congress), 2013, pp: 403-410.
[11] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” in ACM SIGOPS Operating System Review. Bolton Landing, New York, USA, 2003, pp. 29–43.
[12] J. Dean and S. Ghemawat, “MapReduce: A flexible data processing tool,” Commun. ACM, vol. 53, pp. 72–77, 2010.
[13] A. Toshniwal et al., “Storm@twitter,” presented at the Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 2014.
[14] B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. Murthy, and C. Curino, “Apache Tez: A unifying framework for modeling and building data processing applications,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015, pp. 1357–1369.
[15] K. Wang et al., “Overcoming hadoop scaling limitations through distributed task execution,” in Proc. IEEE Int. Conf. Cluster Comput. (CLUSTER’15), 2015, pp. 236–245.
[16] H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark:Lightning-Fast Big Data Analysis. Sebastopol, CA, USA: O’Reilly Media, Inc, 2015.
[17] E. Vermote, S. Kotchenova, and J. Ray, “MODIS surface reflectance user’s guide version 1.3,” in MODIS Land Surface Reflectance Science Computing Facility, 2011 [Online]. Available:
[18] Z. Wan, “MODIS land surface temperature products users’ guide,” Inst. Comput. Earth Syst. Sci., Univ. California, Santa Barbara, CA, USA, 2006 [Online]. Available: http://www.icess. html.
[19] Y. Qu, Q. Liu, S. Liang, L. Wang, N. Liu, and S. Liu, “Direct-estimation algorithm for mapping daily land-surface broadband albedo from MODIS data,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 2, pp. 907–919, Feb.2014.
[20] Sitthapon Pumpichet, Niki Pissinou, Xinyu Jin and Deng Pan, "Belief-based Cleaning in Trajectory Sensor Streams", IEEE ICCC 2012, Adhoc and Sensor Networking Symposium Pages: 208 - 212, 2012.
[21] [Available online: 14110/2014, 2312] https:llearth.esa.inti
[22] [Available online: 1511012014, 0333]
[23] Olson, Mike. "Hadoop: Scalable, flexible data storage and analysis." IQT Quarterly 1.3 (2010): 14-18.
[24] Castro P S, Zhang D, Li S. Urban traffic modelling and prediction using large scale taxi GPS traces[M]//Springer, 2012:57-72.
[25] J. D, S. G. MapReduce: simplified data processing on large clusters: Operating Systems Design and Implementation, 2004[C].
[26] Liu L, Andris C, Ratti C. Uncovering cabdrivers’ behavior patterns from their digital traces[J]. Computers, Environment and Urban Systems, 2010,34(6):541-548.
[27] Liu Y, Liu X, Gao S et al. Social sensing: a new approach to understanding our socioeconomic environments[J]. Annals of the Association of American Geographers, 2015,105(3):512-530.
[28] Simoes J, Gimènez R, Planagumà M. Big Data y Bases de Datos Espaciales: una análisis comparativo[J]. 2015.
[29] Wang Y, Liu Z, Liao H et al. Improving the performance of GIS polygon overlay computation with MapReduce for spatial big data processing[J]. Cluster Computing, 2015,18(2):507-516.

