VLSI-Based Parallel CNN Accelerator with Quantization for High-Performance Edge Intelligence

Main Article Content

Jameer Shaik
Anumolu Lasmika
Nalukurthi Sumalatha
Vijaya Lakshmi.C

Abstract

The increasing use of deep learning models, especially Convolutional Neural Networks (CNNs), has created a demand for efficient hardware solutions due to their high computational and energy requirements. Traditional CPU and GPU-based systems often face challenges such as high latency and power consumption, particularly in edge devices. The objective of this study is to design and implement an energy-efficient CNN hardware accelerator using VLSI architecture suitable for real-time image classification tasks. The proposed approach integrates a lightweight CNN model with a VLSI-based hardware design that includes parallel processing elements, optimized dataflow, and fixed-point quantization. The system is evaluated using the CIFAR-10 dataset, which consists of 60,000 images across 10 classes. Preprocessing techniques such as normalization and data augmentation are applied, and the trained model is mapped onto hardware using an efficient pipeline. Experimental results show that the proposed system achieves an accuracy of 94.8%, precision of 94.1%, recall of 93.6%, and F1-score of 93.8%. Compared to conventional approaches, the design demonstrates reduced latency and lower power consumption while maintaining high throughput. The use of quantization significantly improves energy efficiency with minimal impact on accuracy. In conclusion, the proposed VLSI-based CNN accelerator provides a practical solution for real-time edge AI applications, offering a balanced trade-off between performance and energy efficiency. This work contributes to the development of scalable and hardware-efficient deep learning systems

Article Details

How to Cite
[1]
Jameer Shaik, Anumolu Lasmika, Nalukurthi Sumalatha, and Vijaya Lakshmi.C, “VLSI-Based Parallel CNN Accelerator with Quantization for High-Performance Edge Intelligence”, Int. J. Comput. Eng. Res. Trends, vol. 13, no. 3, pp. 29–39, Mar. 2026.
Section
Research Articles

References

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 161–170, 2015. https://doi.org/10.1145/2684746.2689060

J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, and H. Yang, “Going deeper with embedded FPGA platform for convolutional neural networks,” Proc. ACM/SIGDA FPGA, pp. 26–35, 2016. https://doi.org/10.1145/2847263.2847265

Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE J. Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017. https://doi.org/10.1109/JSSC.2016.2616357

V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proc. IEEE, vol. 105, no. 12, pp. 2295–2329, 2017. https://doi.org/10.1109/JPROC.2017.2761740

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, M. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, and D. Yoon, “In-datacenter performance analysis of a tensor processing unit,” Proc. ISCA, pp. 1–12, 2017. https://doi.org/10.1145/3079856.3080246

T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “DianNao: A small-footprint high-throughput accelerator for ubiquitous machine learning,” Proc. ASPLOS, pp. 269–284, 2014. https://doi.org/10.1145/2541940.2541967

Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “ShiDianNao: Shifting vision processing closer to the sensor,” Proc. ISCA, pp. 92–104, 2015. https://doi.org/10.1145/2749469.2750389

D. V. Jayaraj, B. T. Selvi, D. A. Udhayakumar, D. J. Dhanasekar, D. P. Jayashree, and D. A. K. Kumar, “VLSI architecture for energy-efficient convolutional neural networks in embedded image recognition systems,” Int. J. Adv. Smart Inf. Syst., vol. 12, no. 1, pp. 46–62, 2026. https://doi.org/10.29284/ijasis.12.1.2026.46-62

M. Kavitha, “Energy-efficient edge-AI accelerator design using reconfigurable FPGA-based VLSI architecture,” J. VLSI Embedded Syst. Design, pp. 26–33, 2025. https://iaeces.com/Index/index.php/JVESD/article/view/28

H. M. Snousi, F. A. Aleej, M. F. Bara, and A. Alkilany, “Design and implementation of an energy-efficient AI accelerator architecture for edge-based embedded VLSI platforms,” Prog. AI-Accelerated VLSI Syst., pp. 22–31, 2026. https://iaeces.com/Index/index.php/PAIVS/article/view/90

H. M. Snousi and F. A. Aleej, “Energy-efficient VLSI architecture for lightweight CNN inference on edge devices,” J. Reconfigurable Hardware Archit. Embedded Syst., vol. 2, no. 1, pp. 7–13, 2025. https://fsrap.com/index.php/JRHAES/article/view/8

K. N. Reddy, R. D, V. Gutam, K. Navya, S. P. A, and R. Karne, “Architectural design and optimization of energy-efficient deep learning accelerators in VLSI,” Proc. ICRTEECT, pp. 1–6, 2025. https://doi.org/10.1109/ICRTEECT67512.2025.11448659

Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “FINN: A framework for fast, scalable binarized neural network inference,” Proc. FPGA, pp. 65–74, 2017. https://doi.org/10.1145/3020078.3021744

R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta, and Z. Zhang, “Accelerating binarized convolutional neural networks with software-programmable FPGAs,” Proc. FPGA, pp. 15–24, 2017. https://doi.org/10.1145/3020078.3021741

H. Li, X. Fan, L. Jiao, W. Cao, X. Zhou, and L. Wang, “A high-performance FPGA-based accelerator for large-scale convolutional neural networks,” Electronics, vol. 8, no. 3, p. 281, 2019. https://doi.org/10.3390/electronics8030281

S. Wang, Z. Liu, and T. Chen, “High-speed CNN accelerator SoC design based on systolic array architecture,” Electronics, vol. 13, no. 8, p. 1564, 2024. https://doi.org/10.3390/electronics13081564

Y. Chen, T. Liu, and Q. Zhang, “Efficient CNN accelerator using decomposable Winograd method,” Electronics, vol. 14, no. 6, p. 1182, 2024. https://doi.org/10.3390/electronics14061182

Y. Shen, R. Zhao, and K. Li, “An efficient CNN accelerator for pattern-compressed sparse neural networks,” Neurocomputing, 2024. https://doi.org/10.1016/j.neucom.2024.128700

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. CVPR, pp. 770–778, 2016. https://doi.org/10.1109/CVPR.2016.90

A. M. Agrawal, “CIFAR-10 dataset including train and test images,” Kaggle, 2022. https://www.kaggle.com/datasets/ayush1220/cifar10