Survey on Big Data using Apache Hadoop and Spark

Priya Dahiya; Chaitra.B; Usha Kumari

PDF

Published: Jun 30, 2017

Keywords:

: Big data, Spark, Hadoop, HDFS, MapReduce, YARN.

Priya Dahiya

Information Science Dept. , Acharya Doctor Sarvepalli Radhakrishnan Rd, Bengaluru, Karnataka 560107, India.

Chaitra.B

Information Science Dept. , Acharya Doctor Sarvepalli Radhakrishnan Rd, Bengaluru, Karnataka 560107, India.

Usha Kumari

Information Science Dept. , Acharya Doctor Sarvepalli Radhakrishnan Rd, Bengaluru, Karnataka 560107, India.

Abstract

Big data is growing rapidly concerning volume, variability, and velocity, making it difficult to process, capture, and analyze the data. Hadoop uses MapReduce, which consists of two parts: Map and Reduce. In contrast, Spark utilizes Resilient Distributed Datasets (RDD) and Directed Acyclic Graph (DAG) for processing large datasets. Both Hadoop and Spark use Hadoop Distributed File System (HDFS) to store data. This paper demonstrates the architecture and workings of Hadoop and Spark, highlighting their differences and the challenges faced by MapReduce during the processing of large datasets. Additionally, it explores how Spark operates on Hadoop YARN.

How to Cite

[1]

Priya Dahiya, Chaitra.B, and Usha Kumari, “Survey on Big Data using Apache Hadoop and Spark”, Int. J. Comput. Eng. Res. Trends, vol. 4, no. 6, pp. 195–201, Jun. 2017.

Issue

Vol. 4 No. 6 (2017): June (2017) Issue

Section

Survey

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

IJCERT Policy:

The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.

By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.

References

Bobade, V. B. (2016). Survey Paper on Big Data and Hadoop. International Research Journal of Engineering and Technology (IRJET), 3(1), e-ISSN: 2395-0056, p-ISSN: 2395-0072.

Samuel, S. J., RVP, K., Sashidhar, K., & Bharathi, C. R. (2015). A Survey on Big Data and Its Research Challenges. ARPN Journal of Engineering and Applied Sciences, 10(8), ISSN 1819-6608.

Chavan, V., & Pursue, R. N. (2014). Survey Paper On Big Data. International Journal of Computer Science and Information Technologies (IJCSIT), 5(6), 7932-7939.

Verma, A., Mansuri, A. H., & Jain, N. (2016). Big Data Management Processing with Hadoop MapReduce and Spark Technology: A Comparison. In 2016 Symposium on Colossal Data Analysis and Networking (CDAN). IEEE. ISBN 978-1-5090-0669-4.

Huang, W., Meng, L., Zhang, D., & Zhang, W. (2016). In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(1), December.

Grolinger, K., Hayes, M., Higashino, W. A., L'Heureux, A., Allison, D. S., & Capretz, M. A. M. (2014). Challenges for MapReduce in Big Data. In Proceedings of the IEEE International Conference on Services Computing (SCC). DOI: 10.1109/SERVICES.2014.4. ISBN 978-1-4799-5069-0.

LIN, X., WANG, P., & WU, B. (2013). LOG ANALYSIS IN CLOUD COMPUTING ENVIRONMENT WITH HADOOP AND SPARK. In Proceedings of the IEEE International Conference on Cloud Computing (CLOUD). ISBN 978-1-4799-0094-7.

Lakshmi, K. N. M., et al. (2016). International Journal of Computer Engineering in Research Trends, 3(3), 134-142.

Mane, S. B., et al. (2017). Product Rating using Opinion Mining. International Journal of Computer Engineering in Research Trends, 4(5), 161-168.

Survey on Big Data using Apache Hadoop and Spark

Abstract

References

QUICK LINKS

FOR AUTHORS

FOR REVIEWERS

JOURNAL CONTENTS

DOWNLOADS

Article Sidebar

Main Article Content

Abstract

Article Details

References