Assessing the Performance of Python Data Visualization Libraries: A Review

Main Article Content

Addepalli Lavanya
Lokhande Gaurav
Sakinam Sindhuja
Hussain Seam
Mookerjee Joydeep
Vamsi Uppalapati
Waqas Ali
Vidya Sagar S.D

Abstract

Python is one of the most widely used programming languages for data analysis, visualization, and machine learning. One of Python's key strengths is its rich library ecosystem that provides powerful data visualization tools. Several Python data visualization libraries have emerged in recent years, making it challenging for data analysts and scientists to choose the right library for their visualization needs. Therefore, this research paper aims to assess the performance of Python data visualization libraries and comprehensively review their strengths and limitations. The research paper begins by providing an overview of the most popular Python data visualization libraries, including Matplotlib, Seaborn, Plotly, Bokeh, Altair, and ggplot. We then evaluate each library's performance in terms of its functionality, ease of use, flexibility, and speed.. Additionally, we assess the visual quality of the plots produced by each library and compare them to industry standards. We evaluate the performance of each library by testing them on various datasets and use cases, including large and small datasets, static and interactive visualizations, and different plot types, such as scatter plots, line plots, bar charts, and heatmaps. Our findings suggest that each library has unique strengths and limitations, making choosing one library that fits all visualization needs difficult. However, Matplotlib, Seaborn, and Plotly are the most popular and widely used Python data visualization libraries, each with unique strengths. Matplotlib is a powerful and flexible library that offers a broad range of plotting options, making it ideal for creating complex and customized plots. Seaborn is a high-level library that simplifies the plotting process by providing a consistent interface and easy-to-use functions. Plotly is an interactive visualization library offering rich features for creating web-based visualizations and dashboards. We also find that Bokeh, Altair, and ggplot are less popular but offer unique features and functionality. 

Article Details

How to Cite
[1]
Addepalli Lavanya, “Assessing the Performance of Python Data Visualization Libraries: A Review”, Int. J. Comput. Eng. Res. Trends, vol. 10, no. 1, pp. 28–39, Jan. 2023.
Section
Reviews

References

S. Cao, Y. Zeng, S. Yang, and S. Cao, "Research on Python data visualization technology," in Journal of Physics: Conference Series, vol. 1757, 2021, p. 012122.

I. Stanand A. Jovic', "An overview and comparison of free Python li- braries for data mining and big data analysis," in 2019 42nd International convention on information and communication technology, electronics and microelectronics (MIPRO), 2019, pp. 977–982.

K. Dale, Data Visualization with Python and JavaScript.

M. C. Mihaescu and P. S. Popescu, "Review on publicly available datasets for educational data mining," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 11, no. 3, p. e1403, 2021.

M. L. Waskom, "Seaborn: statistical data visualization," Journal of Open Source Software, vol. 6, no. 60, p. 3021, 2021.

I. Stanand A. Jovic', "An overview and comparison of free Python li- braries for data mining and big data analysis," in 2019 42nd International convention on information and communication technology, electronics and microelectronics (MIPRO), 2019, pp. 977–982.

T. Zhang and L. Mei, "Analysis and research on computer visualization in data science with bokeh and JavaScript," in Journal of Physics: Conference Series, vol. 2033, 2021, p. 012154.

A. Batch and N. Elmqvist, "The interactive visualization gap in initial exploratory data analysis," IEEE transactions on visualization and computer graphics, vol. 24, no. 1, pp. 278–287, 2017.

R. Wang, Y. Perez-Riverol, H. Hermjakob, and J. A. Vizcaíno, "Open source libraries and frameworks for biological data visualization: A guide for developers," Proteomics, vol. 15, no. 8, pp. 1356–1374, 2015.

X. Lou, S. V. D. Lee, and S. Lloyd, "AIMBAT: A python/matplotlib tool for measuring teleseismic arrival times," Seismological Research Letters, vol. 84, no. 1, pp. 85–93, 2013.

R. Kumar, "Future for scientific computing using Python," International Journal of Engineering Technologies and Management Research, vol. 2, no. 1, pp. 30–41, 2015.

C. Rossant, Learning IPython for interactive computing and data visualization. Packt Publishing Ltd, 2015.

D. Rolon-Mérette, M. Ross, T. Rolon-Mérette, and K. Church, "In- troduction to Anaconda and Python: Installation and setup," Quant. Methods Psychol, vol. 16, no. 5, pp. 3–11, 2016.

W. S. Pittard and S. Li, "The essential toolbox of data science: Python, R, Git, and Docker," Computational Methods and Data Analysis for Metabolomics, pp. 265–311, 2020.

P. Bruce, A. Bruce, and P. Gedeck, Practical statistics for data scientists: 50+ essential concepts using R and Python. O'Reilly Media, 2020.

M. Allen, D. Poggiali, K. Whitaker, T. R. Marshall, and R. A. Kievit, "Raincloud plots: a multi-platform tool for robust data visualization," Wellcome open research, vol. 4, 2019.

D. P. Kroese, Z. Botev, T. Taimre, and R. Vaisman, Data science and machine learning: mathematical and statistical methods. CRC Press, 2019.

C. Sievert, Interactive web-based data visualization with R, plotly, and shiny. CRC Press, 2020.

S. M. Ali, N. Gupta, G. K. Nayak, and R. K. Lenka, "Big data visual- ization: Tools and challenges," in 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), 2016, pp. 656–660.

I. Stanand A. Jovic', "An overview and comparison of free Python li- braries for data mining and big data analysis," in 2019 42nd International convention on information and communication technology, electronics and microelectronics (MIPRO), 2019, pp. 977–982.

C. Gubala and L. Melonçon, "Data Visualizations: An Integrative Literature Review of Empirical Studies Across Disciplines," in 2022 IEEE International Professional Communication Conference (ProComm), 2022, pp. 112–119. [Online]. Available: 10.1109/ProComm53155.2022.00024

L. Podo and P. Velardi, "Plotly. plus, an Improved Dataset for Visualiza- tion Recommendation," in Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 4384–4388.

K. Jolly, Hands-on data visualization with Bokeh: Interactive web plotting for Python using Bokeh. Packt Publishing Ltd, 2018.

C. Chai, C. J. Ammon, M. Maceira, and R. B. Herrmann, "Interactive visualization of complex seismic data and models using Bokeh," Seis- mological Research Letters, vol. 89, no. 2A, pp. 668–676, 2018.

D. O. Embarak and O. Embarak, "Data visualization," Data Analysis and Visualization Using Python: Analyze Data to Create Visualizations for BI Systems, pp. 293–342, 2018.

S. A. Fahad and A. E. Yahya, "Big data visualization: Allotting by r and python with gui tools," in 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), 2018, pp. 1–8.

D. Y. Chen, Pandas for everyone: Python data analysis. Addison- Wesley Professional, 2017.

P. Lemenkova, "Processing oceanographic data by Python libraries NumPy, SciPy and Pandas," Aquatic Research, vol. 2, no. 2, pp. 73– 91, 2019.

A. Pal and P. K. S. Prakash, Practical time series analysis: master time series data processing, visualization, and modeling using Python. Packt Publishing Ltd, 2017.

T. Petrou, Pandas Cookbook: Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python. Packt Publishing Ltd, 2017.

C. R. Harris, K. J. Millman, S. J. V. D. Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, and N. J. Smith, "Array programming with NumPy," Nature, vol. 585, no. 7825, pp. 357–362, 2020.

P. Lemenkova, "Processing oceanographic data by Python libraries NumPy, SciPy and Pandas," Aquatic Research, vol. 2, no. 2, pp. 73– 91, 2019.

W. McKinney, Python for data analysis: Data wrangling with Pandas, NumPy, and IPython.

——, "Pandas, python data analysis library," URL http://pandas. pydata. org, pp. 3–15, 2015.

C. Fuhrer, J. E. Solem, and O. Verdier, Scientific Computing with Python: High-performance scientific computing with NumPy, SciPy, and pandas. Packt Publishing Ltd, 2021.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, and V. Dubourg, "Scikit-learn: Machine learning in Python," the Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011.

O. Kramer and O. Kramer, "Scikit-learn," Machine learning for evolu- tion strategies, pp. 45–53, 2016.

J. Hao and T. K. Ho, "Machine learning made easy: a review of scikit- learn package in python programming language," Journal of Educational and Behavioral Statistics, vol. 44, no. 3, pp. 348–361, 2019.

R. Garreta and G. Moncecchi, Learning scikit-learn: machine learning in Python. Packt Publishing Ltd, 2013.

K. Ravishankara, V. Dhanush, and I. S. Srajan, "Whatsapp Chat Ana- lyzer," International Journal of Engineering Research & Technol- ogy, vol. 9, no. 5, pp. 897–900, 2020.

T. Haslwanter, "An Introduction to Statistics with Python," With Ap- plications in the Life Sciences.. Switzerland: Springer International Publishing, 2016.

E. Bisong and E. Bisong, "Matplotlib and seaborn," Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, pp. 151–165, 2019.

I. Stanand A. Jovic', "An overview and comparison of free Python li- braries for data mining and big data analysis," in 2019 42nd International convention on information and communication technology, electronics and microelectronics (MIPRO), 2019, pp. 977–982.

D. O. Embarak and O. Embarak, "Data visualization," Data Analysis and Visualization Using Python: Analyze Data to Create Visualizations for BI Systems, pp. 293–342, 2018.

E. Dabbas, Interactive Dashboards and Data Apps with Plotly and Dash: Harness the power of a fully fledged frontend web framework in Python- no JavaScript required. Packt Publishing Ltd, 2021.

J. VanderPlas, B. Granger, J. Heer, D. Moritz, K. Wongsuphasawat, A. Satyanarayan, E. Lees, I. Timofeev, B. Welsh, and S. Sievert, "Altair: interactive statistical visualizations for Python," Journal of open source software, vol. 3, no. 32, p. 1057, 2018.

S. A. Fahad and A. E. Yahya, "Big data visualization: Allotting by r and python with gui tools," in 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), 2018, pp. 1–8.

D. O. Embarak and O. Embarak, "Data visualization," Data Analysis and Visualization Using Python: Analyze Data to Create Visualizations for BI Systems, pp. 293–342, 2018.

A. Cuttone, S. Lehmann, and J. E. Larsen, "Geoplotlib: a python toolbox for visualizing geographical data," arXiv preprint arXiv:1608.01933, 2016.

C. Room, "Machine Learning in Python," algorithms, vol. 8, no. 46, p. 30, 2022.

Most read articles by the same author(s)