Impact Factor:6.549
 Scopus Suggested Journal: UNDER REVIEW for TITLE INCLUSSION

International Journal
of Computer Engineering in Research Trends (IJCERT)

Scholarly, Peer-Reviewed, Open Access and Multidisciplinary


Welcome to IJCERT

International Journal of Computer Engineering in Research Trends. Scholarly, Peer-Reviewed,Open Access and Multidisciplinary

ISSN(Online):2349-7084                 Submit Paper    Check Paper Status    Conference Proposal

Back to Current Issues

BIG DATA ANALYSIS ON YOUTUBE USING HADOOP and MAPREDUCE

Soma Hota, , , ,
Affiliations
Amity School of Engineering and Technology - Computer Science Engineering, Amity University, Mumbai - Pune
:10.22362/ijcert/2018/v5/i4/v5i403


Abstract
We live in a digitalized world today. An enormous amount of data is generated from every digital service we use. This enormous amount of generated data is called Big Data. According to Wikipedia, Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them .Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy. Google‘s video streaming services, YouTube, is one of the best examples of services which produces a huge quantity of data in a very short period. Data mining of such an enormous quantity of data is performed using Hadoop and MapReduce to measure performance. Hadoop is a system which provides a reliable shared storage of such huge datasets on the cloud and also provides an analysis system. The storage is provided by HDFS (Hadoop Distributed File System) and analysis by MapReduce. MapReduce is a programming model and an associated implementation for processing large data sets. This paper presents the algorithmic work on big data problem and its optimal solution using Hadoop cluster and HDFS for YouTube dataset storage and using parallel processing to process large data sets using Map Reduce programming framework. In this paper, we solve two problem statements using the YouTube dataset – top 5 video categories (genres) with the maximum number of videos uploaded and top 5 video uploaders on YouTube. A particularly distinguishing feature of this paper is its focus on analytics performed in unstructured data, which constitute 95% of big data.


Citation
Soma Hota (2018). BIG DATA ANALYSIS ON YOUTUBE USING HADOOP and MAPREDUCE. International Journal of Computer Engineering In Research Trends, 5(4), 98-104. Retrieved from http://ijcert.org/ems/ijcert_papers/V5I403.pdf


Keywords : Big Data definition, Data mining, YouTube data analysis, Hadoop, HDFS, MapReduce, unstructured dataset analysis.

References
1.	Webster, John. "MapReduce: Simplified Data Processing on Large Clusters", "Search Storage", 2004. Retrieved on 25 March 2013. https://static.googleusercontent.com/media/research.g oogle.com/en//archive/mapreduce-osdi04.pdf

2.	Bibliography: Big Data Analytics: Methods and Applications by Saumyadipta Pyne, B.L.S. Prakasa Rao, S.B. Rao

3.	YOUTUBE COMPANY STATISTICS. https://www.statisticbrain.com/youtube-statistics/

4.	Youtube.com @2017. YouTube for media. https://www.youtube.com/yt/about/press/

5.	Big data;Wikipedia https://en.wikipedia.org/wiki/Big_data

6.	Kallerhoff, Phillip. ―Big Data and Credit Unions: Machine Learning in Member Transactions https://filene.org/assets/pdfreports/301_Kallerhoff_M achine_Learning.pdf

7.	Marr,Barnard.―Why only one of the 5 Vs of big data really matters http://www.ibmbigdatahub.com/blog/why-only-one-5-vs- big-data-really-matters
8.	Resources Management Association (IRMA). 2016. Information. "Chapter 1 - Big Data Overview". Big Data: Concepts, Methodologies, Tools, and Applications, Volume I. IGI Global. http://common.books24x7.com/toc.aspx?bookid=114 046
9.	Apache Hadoop
10.	http://hadoop.apache.org/
11.	How To Analyze Big Data With Hadoop Technologies; 3pillarglobal.com. 2017  https://www.3pillarglobal.com/insights/analyze-big-data-hadoop-technologies Dean, S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, in:

12.	OSDI‘04, 6th Symposium on Operating Systems

13.	Design and Implementation, Sponsored by USENIX, in cooperation with ACM SIGOPS, 2004, pp. 137– 150

14.	Big Data Tutorial1:MapReduce https://wikis.nyu.edu/display/NYUHPC/Big+Data+T utorial+1%3A+MapReduce

15.	MacLean,Diana.‖A Very Brief Introduction to MapReduce http://hci.stanford.edu/courses/cs448g/a2/files/map_r educe_tutorial.pdf

16.	Edureka.‘Install Hadoo p:Setting up a single node cluster‘. https://www.edureka.co/blog/install-hadoop-single-node-hadoop-cluster


DOI Link : https://doi.org/10.22362/ijcert/2018/v5/i4/v5i403

Download :
  V5I403.pdf


Refbacks : Currently there are no refbacks

Announcements


Authors are not required to pay any article-processing charges (APC) for their article to be published open access in Journal IJCERT. No charge is involved in any stage of the publication process, from administrating peer review to copy editing and hosting the final article on dedicated servers. This is free for all authors. 

News & Events


Latest issue :Volume 10 Issue 1 Articles In press

A plagiarism check will be implemented for all the articles using world-renowned software. Turnitin.


Digital Object Identifier will be assigned for all the articles being published in the Journal from September 2016 issue, i.e. Volume 3, Issue 9, 2016.


IJCERT is a member of the prestigious.Each of the IJCERT articles has its unique DOI reference.
DOI Prefix : 10.22362/ijcert


IJCERT is member of The Publishers International Linking Association, Inc. (“PILA”)


Emerging Sources Citation Index (in process)


IJCERT title is under evaluation by Scopus.


Key Dates


☞   INVITING SUBMISSIONS FOR THE NEXT ISSUE :
☞   LAST DATE OF SUBMISSION : 31st March 2023
☞  SUBMISSION TO FIRST DECISION :
In 7 Days
☞  FINAL DECISION :
IN 3 WEEKS FROM THE DAY OF SUBMISSION

Important Announcements


All the authors, conference coordinators, conveners, and guest editors kindly check their articles' originality before submitting them to IJCERT. If any material is found to be duplicate submission or sent to other journals when the content is in the process with IJCERT, fabricated data, cut and paste (plagiarized), at any stage of processing of material, IJCERT is bound to take the following actions.
1. Rejection of the article.
2. The author will be blocked for future communication with IJCERT if duplicate articles are submitted.
3. A letter regarding this will be posted to the Principal/Director of the Institution where the study was conducted.
4. A List of blacklisted authors will be shared among the Chief Editors of other prestigious Journals
We have been screening articles for plagiarism with a world-renowned tool: Turnitin However, it is only rejected if found plagiarized. This more stern action is being taken because of the illegal behavior of a handful of authors who have been involved in ethical misconduct. The Screening and making a decision on such articles costs colossal time and resources for the journal. It directly delays the process of genuine materials.

Citation Index


Citations Indices All
Citations 1026
h-index 14
i10-index 20
Source: Google Scholar

Acceptance Rate (By Year)


Acceptance Rate (By Year)
Year Rate
2021 10.8%
2020 13.6%
2019 15.9%
2018 14.5%
2017 16.6%
2016 15.8%
2015 18.2%
2014 20.6%

Important Links



Conference Proposal




DOI:10.22362/ijcert