BIG DATA ANALYSIS ON YOUTUBE USING HADOOP and MAPREDUCE
Main Article Content
Abstract
We live in a digitalized world today. An enormous amount of data is generated from every digital service we use. This enormous amount of generated data is called Big Data. According to Wikipedia, Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them .Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy. Google‘s video streaming services, YouTube, is one of the best examples of services which produces a huge quantity of data in a very short period. Data mining of such an enormous quantity of data is performed using Hadoop and MapReduce to measure performance. Hadoop is a system which provides a reliable shared storage of such huge datasets on the cloud and also provides an analysis system. The storage is provided by HDFS (Hadoop Distributed File System) and analysis by MapReduce. MapReduce is a programming model and an associated implementation for processing large data sets. This paper presents the algorithmic work on big data problem and its optimal solution using Hadoop cluster and HDFS for YouTube dataset storage and using parallel processing to process large data sets using Map Reduce programming framework. In this paper, we solve two problem statements using the YouTube dataset – top 5 video categories (genres) with the maximum number of videos uploaded and top 5 video uploaders on YouTube. A particularly distinguishing feature of this paper is its focus on analytics performed in unstructured data, which constitute 95% of big data.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
IJCERT Policy:
The published work presented in this paper is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. This means that the content of this paper can be shared, copied, and redistributed in any medium or format, as long as the original author is properly attributed. Additionally, any derivative works based on this paper must also be licensed under the same terms. This licensing agreement allows for broad dissemination and use of the work while maintaining the author's rights and recognition.
By submitting this paper to IJCERT, the author(s) agree to these licensing terms and confirm that the work is original and does not infringe on any third-party copyright or intellectual property rights.
References
Webster, John. "MapReduce: Simplified Data Processing on Large Clusters", "Search Storage", 2004. Retrieved on 25 March 2013. https://static.googleusercontent.com/media/research.g oogle.com/en//archive/mapreduce-osdi04.pdf
Bibliography: Big Data Analytics: Methods and Applications by Saumyadipta Pyne, B.L.S. Prakasa Rao, S.B. Rao
YOUTUBE COMPANY STATISTICS. https://www.statisticbrain.com/youtube-statistics/
Youtube.com @2017. YouTube for media. https://www.youtube.com/yt/about/press/
Big data;Wikipedia https://en.wikipedia.org/wiki/Big_data
Kallerhoff,Phillip. ―Big Data and Credit Unions: Machine Learning in Member Transactions https://filene.org/assets/pdfreports/301_Kallerhoff_M achine_Learning.pdf
Marr,Barnard.―Why only one of the 5 Vs of big data really matters http://www.ibmbigdatahub.com/blog/why-only-one-5- vs- big-data-really-matters
Resources Management Association (IRMA). 2016. Information. "Chapter 1 - Big Data Overview". Big Data: Concepts, Methodologies, Tools, and Applications, Volume I. IGI Global. http://common.books24x7.com/toc.aspx?bookid=114 046
Apache Hadoop
How To Analyze Big Data With Hadoop Technologies ; 3pillarglobal.com. 2017 https://www.3pillarglobal.com/insights/analyze-bigdata-hadoop-technologies Dean, S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, in:
OSDI‘04, 6th Symposium on Operating Systems
Design and Implementation, Sponsored by USENIX, in cooperation with ACM SIGOPS, 2004, pp. 137– 150
Big Data Tutorial1:MapReduce https://wikis.nyu.edu/display/NYUHPC/Big+Data+T utorial+1%3A+MapReduce
MacLean,Diana.‖A Very Brief Introduction to MapReduce http://hci.stanford.edu/courses/cs448g/a2/files/map_r educe_tutorial.pdf
Edureka.‘Install Hadoop:Setting up a single node cluster‘. https://www.edureka.co/blog/install-hadoopsingle-node-hadoop-cluster