基于HLS工具的CNN加速器的设计与优化方法研究-AET-电子技术应用

基于HLS工具的CNN加速器的设计与优化方法研究

2021年电子技术应用第3期

程佳风，王红亮

中北大学电子测量技术国家重点实验室，山西太原030051

摘要： 基于软硬件协同设计的思想，利用HLS工具，在PYNQ-Z2平台上设计并实现了一个卷积神经网络加速器，对卷积运算采用矩阵切割的优化方法，均衡了资源消耗和计算资源，使得加速器的性能达到了最优。利用MNIST数据集对加速器IP核进行性能测试，实验结果表明：对单张图片的测试，该加速器相对于ARM平台实现了5.785的加速效果，对于1 000张图片的测试则可达到9.72的加速效果，随着测试图片数量的不断增加，加速器的性能也将越来越优。

关键词： 卷积神经网络 PYNQ-Z2 HLS工具加速器

中图分类号： TN108.1
文献标识码： A
DOI：10.16157/j.issn.0258-7998.200841
中文引用格式： 程佳风，王红亮. 基于HLS工具的CNN加速器的设计与优化方法研究[J].电子技术应用，2021，47(3)：18-21，26.
英文引用格式： Cheng Jiafeng，Wang Hongliang. Research on the design and optimization method of CNN accelerator based on HLS tools[J]. Application of Electronic Technique，2021，47(3)：18-21，26.

Research on the design and optimization method of CNN accelerator based on HLS tools

Cheng Jiafeng，Wang Hongliang

National Key Laboratory for Electronic Measurement Technology，North University of China，Taiyuan 030051，China

Abstract： Based on the idea of software and hardware co-design, this article uses HLS tools to design and implement a convolutional neural network accelerator on the PYNQ-Z2 platform, and uses the matrix cutting optimization method for convolution operations to balance resource consumption and computing resources , so that the performance of the accelerator is optimized. This article uses the MNIST data set to test the performance of the accelerator IP core. The experimental results show that: for a single image test, the accelerator achieves an acceleration effect of 5.785 compared with the ARM platform, and an acceleration of 9.72 for a 1000 image test. As a result, as the number of test images continues to increase, the performance of the accelerator will become better and better.

Key words : convolutional neural network(CNN)；PYNQ-Z2；HLS tool；accelerator

0 引言

近年来，卷积神经网络的应用范围越来越广泛，其应用场景也日益复杂，卷积神经网络的计算密集和存储密集特征日益凸显，成为快速高效实现卷积神经网络的限制。于是基于GPU^[1]、ASIC^[2]、FPGA^[3]的不同的加速器平台被相继提出以提升CNN的设计性能。GPU的电力消耗巨大，硬件结构固定，限制了卷积神经网络在嵌入式设备的应用；ASIC开发成本极高，灵活性低，不适合搭载复杂多变的卷积神经网络；FPGA具有功耗低、性能高、灵活性好的特点，因此更加适用于卷积神经网络硬件加速的开发研究，但由于Verilog HDL开发门槛高，开发周期相对较长，影响了FPGA在卷积神经网络应用的普及^[4-5]。

本文基于软硬件协同的思想，利用HLS工具，在PYNQ-Z2上实现了一个卷积神经网络加速器，并采用矩阵切割的设计方法对卷积核运算进行优化。

本文详细内容请下载:http://www.chinaaet.com/resource/share/2000003402

作者信息:

程佳风，王红亮

(中北大学电子测量技术国家重点实验室，山西太原030051)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容