Survey on Big Data using Apache Hadoop and Spark

Main Article Content

Priya Dahiya
Chaitra.B
Usha Kumari

Abstract

Big data is growing rapidly concerning volume, variability, and velocity, making it difficult to process, capture, and analyze the data. Hadoop uses MapReduce, which consists of two parts: Map and Reduce. In contrast, Spark utilizes Resilient Distributed Datasets (RDD) and Directed Acyclic Graph (DAG) for processing large datasets. Both Hadoop and Spark use Hadoop Distributed File System (HDFS) to store data. This paper demonstrates the architecture and workings of Hadoop and Spark, highlighting their differences and the challenges faced by MapReduce during the processing of large datasets. Additionally, it explores how Spark operates on Hadoop YARN.

Article Details

How to Cite
[1]
Priya Dahiya, Chaitra.B, and Usha Kumari, “Survey on Big Data using Apache Hadoop and Spark”, Int. J. Comput. Eng. Res. Trends, vol. 4, no. 6, pp. 195–201, Jun. 2017.
Section
Survey

References

Bobade, V. B. (2016). Survey Paper on Big Data and Hadoop. International Research Journal of Engineering and Technology (IRJET), 3(1), e-ISSN: 2395-0056, p-ISSN: 2395-0072.

Samuel, S. J., RVP, K., Sashidhar, K., & Bharathi, C. R. (2015). A Survey on Big Data and Its Research Challenges. ARPN Journal of Engineering and Applied Sciences, 10(8), ISSN 1819-6608.

Chavan, V., & Pursue, R. N. (2014). Survey Paper On Big Data. International Journal of Computer Science and Information Technologies (IJCSIT), 5(6), 7932-7939.

Verma, A., Mansuri, A. H., & Jain, N. (2016). Big Data Management Processing with Hadoop MapReduce and Spark Technology: A Comparison. In 2016 Symposium on Colossal Data Analysis and Networking (CDAN). IEEE. ISBN 978-1-5090-0669-4.

Huang, W., Meng, L., Zhang, D., & Zhang, W. (2016). In-Memory Parallel Processing of Massive Remotely Sensed Data Using an Apache Spark on Hadoop YARN Model. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(1), December.

Grolinger, K., Hayes, M., Higashino, W. A., L'Heureux, A., Allison, D. S., & Capretz, M. A. M. (2014). Challenges for MapReduce in Big Data. In Proceedings of the IEEE International Conference on Services Computing (SCC). DOI: 10.1109/SERVICES.2014.4. ISBN 978-1-4799-5069-0.

LIN, X., WANG, P., & WU, B. (2013). LOG ANALYSIS IN CLOUD COMPUTING ENVIRONMENT WITH HADOOP AND SPARK. In Proceedings of the IEEE International Conference on Cloud Computing (CLOUD). ISBN 978-1-4799-0094-7.

Lakshmi, K. N. M., et al. (2016). International Journal of Computer Engineering in Research Trends, 3(3), 134-142.

Mane, S. B., et al. (2017). Product Rating using Opinion Mining. International Journal of Computer Engineering in Research Trends, 4(5), 161-168.