Accurate Analytics Assurance Using an Apache Spark on Hadoop Yarn Model for Emerging Big Data Systems

Mallikarjuna Reddy Beram

PDF

Published: Sep 29, 2019

Keywords:

Automation, Hadoop Framework, Spark, Map Reduce Data Scientist, Data Engineer, Optimization, Performance, Parallel Process

Mallikarjuna Reddy Beram

Abstract

Time and Tendency have made Information Technology to be the market trend, we call Automation, a need each and everywhere, and trending to Data as the important raw material for today’s world we call Big Data. Hence, In this white paper, the energy and the enthusiasm for the time being given stress on the data used for the energetics decision making, where the entire world moves on. Taking the opportunistic advantage of the Big Data environment, where testing is the biggest challenge for the entire Hadoop or spark or any other framework used to analyze the data to give a realistic picture to the end user, where the decision plays into existence. In this, I have given the functional and non-functional deterministic goal-driven approach to make the Data scientist and data engineer model data. Based on the Modelling, the test condition should be written in the map-reduce to know whether the node and function working as expected. The next test has a driven approach to get the optimization and performance like steaming data where spark plays the important role would get the good recommendation. Hence, Big data testing involves the next journey for the optimization, performance, and load balance along with the functional aspect of the data-driven by the data scientist needs to be a parallel process as the end functional is always deterministic to the extent of the end user.

How to Cite

[1]

Mallikarjuna Reddy Beram, “Accurate Analytics Assurance Using an Apache Spark on Hadoop Yarn Model for Emerging Big Data Systems”, Int. J. Comput. Eng. Res. Trends, vol. 6, no. 9, pp. 1–11, Sep. 2019.

Issue

Vol. 6 No. 9 (2019): September (2019) Issue

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

IJCERT Policy:

The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.

By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.

References

Chen, M., Mao, S., & Liu, Y, “Big data: A survey”, Mobile Networks and Applications Springer, volume 19, issue 2, April2014, pp. 171-209.

Sagiroglu, S., & Sinanc, D, “Big data: A review”, IEEE International Conference on Collaboration Technologies and Systems (CTS), 2013, pp 42-47.

Pal, A., & Agrawal, S “An experimental approach towards big data for analyzing memory utilization on a Hadoop cluster using HDFS and MapReduce”, IEEE, First International Conference on Networks & Soft Computing (ICNSC), August 2014, pp.442-447.

Zhang, J., & Huang, M. L., “5Ws model for bigdata analysis and visualization,” IEEE 16th International Conference on Computational Science and Engineering, 2013, pp.1021-1028.

Qureshi, S. R., & Gupta, A, “Towards efficient Big Data and data analytics: A review”, IEEE International Conference on IT in Business, Industry and Government (CSIBIG),March 2014 pp-1-6.

Aravinth, M. S., Shanmugapriyaa, M. S., Sowmya, M. S., & Arun, “An Efficient HADOOP Frameworks SQOOP and Ambari for Big Data Processing,” International Journal for Innovative Research in Science and Technology, 2015, pp. 252-255.

Cloudera- http://www.cloudera.com

http://www.zetta.net/blog/cloud-storage-explainedyahoo

Tang, Z., Jiang, L., Zhou, J., Li, K., & Li, K, “A selfadaptive scheduling algorithm for reduce start time” Future Generation Computer Systems, Elsevier, 2015, pp:51-60.

Zheng, Z., Zhu, J., & Lyu, M. R, “ervice-generated big data and big data-as-a-service: an overview,” IEEE International Congress on Big Data (BigData Congress), 2013, pp: 403-410.

S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” in ACM SIGOPS Operating System Review. Bolton Landing, New York, USA, 2003, pp. 29– 43.

J. Dean and S. Ghemawat, “MapReduce: A flexible data processing tool,” Commun. ACM, vol. 53, pp. 72– 77, 2010.

A. Toshniwal et al., “Storm@twitter,” presented at the Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 2014.

B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. Murthy, and C. Curino, “Apache Tez: A unifying framework for modeling and building data processing applications,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2015, pp. 1357–1369.

K. Wang et al., “Overcoming hadoop scaling limitations through distributed task execution,” in Proc. IEEE Int. Conf. Cluster Comput. (CLUSTER’15), 2015, pp. 236–245.

H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark:Lightning-Fast Big Data Analysis. Sebastopol, CA, USA: O’Reilly Media, Inc, 2015.

E. Vermote, S. Kotchenova, and J. Ray, “MODIS surface reflectance user’s guide version 1.3,” in MODIS Land Surface Reflectance Science Computing Facility, 2011 [Online]. Available: http://www.modissr.ltdri.org/.

Z. Wan, “MODIS land surface temperature products users’ guide,” Inst. Comput. Earth Syst. Sci., Univ. California, Santa Barbara, CA, USA, 2006 [Online]. Available: http://www.icess. ucsb.edu/modis/LstUsrGuide/usrguide. html.

Y. Qu, Q. Liu, S. Liang, L. Wang, N. Liu, and S. Liu, “Direct-estimation algorithm for mapping daily landsurface broadband albedo from MODIS data,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 2, pp. 907–919, Feb.2014.

Sitthapon Pumpichet, Niki Pissinou, Xinyu Jin and Deng Pan, "Belief-based Cleaning in Trajectory Sensor Streams", IEEE ICCC 2012, Adhoc and Sensor Networking Symposium Pages: 208 - 212, 2012.

[Available online: 14110/2014, 2312] https:llearth.esa.inti

[Available online: 1511012014, 0333] http://www.brockmann-consult.de/cms/web/beam/

Olson, Mike. "Hadoop: Scalable, flexible data storage and analysis." IQT Quarterly 1.3 (2010): 14-18.

Castro P S, Zhang D, Li S. Urban traffic modelling and prediction using large scale taxi GPS traces[M]//Springer, 2012:57-72.

J. D, S. G. MapReduce: simplified data processing on large clusters: Operating Systems Design and Implementation, 2004[C].

Liu L, Andris C, Ratti C. Uncovering cabdrivers’ behavior patterns from their digital traces[J]. Computers, Environment and Urban Systems, 2010,34(6):541-548.

Liu Y, Liu X, Gao S et al. Social sensing: a new approach to understanding our socioeconomic environments[J]. Annals of the Association of American Geographers, 2015,105(3):512-530.

Simoes J, Gimènez R, Planagumà M. Big Data y Bases de Datos Espaciales: una análisis comparativo[J]. 2015.

Wang Y, Liu Z, Liao H et al. Improving the performance of GIS polygon overlay computation with MapReduce for spatial big data processing[J]. Cluster Computing, 2015,18(2):507-516.

Accurate Analytics Assurance Using an Apache Spark on Hadoop Yarn Model for Emerging Big Data Systems

Abstract

References

Most read articles by the same author(s)

QUICK LINKS

FOR AUTHORS

FOR REVIEWERS

JOURNAL CONTENTS

DOWNLOADS

Article Sidebar

Main Article Content

Abstract

Article Details

References

Most read articles by the same author(s)