基于浮栅器件的低位宽卷积神经网络研究-AET-电子技术应用

基于浮栅器件的低位宽卷积神经网络研究

信息技术与网络安全

陈雅倩，黄鲁

(中国科学技术大学微电子学院，安徽合肥230026)

摘要： 浮栅器件(Flash)能够将存储和计算的特性相结合，实现存算一体化，但是单个浮栅单元最多只能存储位宽为4 bit的数据。面向Nor Flash，研究了卷积神经网络参数的低位宽量化，对经典的AlexNet、VGGNet以及ResNet通过量化感知训练。采用非对称量化，将模型参数从32位浮点数量化至4位定点数，模型大小变为原来的1/8，针对Cifar10数据集，4位量化模型的准确率相对于全精度网络仅下降不到2%。最后将量化完成的卷积神经网络模型使用Nor Flash阵列加速。Hspice仿真结果表明，相对于全精度模型，部署在Nor Flash阵列中的量化模型精度仅下降2.25%，验证了卷积神经网络部署在Nor Flash上的可行性。

关键词： 卷积神经网络量化存算一体 NORFlash

中图分类号： TP183
文献标识码： A
DOI： 10.19358/j.issn.2096-5133.2021.06.007
引用格式：陈雅倩，黄鲁. 基于浮栅器件的低位宽卷积神经网络研究[J].信息技术与网络安全，2021，40(6)：38-42.

Quantification research of convolutional neural network oriented Nor Flash

Chen Yaqian，Huang Lu

(School of Microelectronics，University of Science and Technology of China，Hefei 230026，China)

Abstract： Flash is one of the most promising candidates to bulid processing-in-memory(PIM)structures. However,the data width in one flash is 4bit at most. This article is oriented to Nor Flash and studies the quantitzation of convolution neural network. It performs quantitative perception training on the classic AlexNet, VGGNet and ResNet, and uses asymmetric quantization to quantify the model parameters from 32-bit floating point to 4-bit, and the model size becomes 1/8 of the original. For the Cifar10 data set, the accuracy of the 4-bit quantization model is only less than 2% lower than that of the full-precision network. Finally, the quantized convolutional neural network model is accelerated by the Nor Flash array. Hspice simulation results show that the accuracy of the quantized model bulided in the Nor Flash array is only reduced by 2.25% compared to the full-precision model. The feasibility of deploying the convolutional neural network on Nor Flash is verified.

Key words : convolution neural network；quantification；computation in memory；Nor Flash

0 引言

卷积神经网络(Convolution Neural Network，CNN)在图像识别等领域有着广泛的应用，随着网络深度的不断增加，CNN模型的参数也越来越多，例如Alexnet[1]网络，结构为5层卷积层，3层全连接层，网络参数超过5 000万，全精度的模型需要250 MB的存储空间，而功能更加强大的VGG[2]网络和Res[3]网络的深度以及参数量更是远远超过Alexnet。对于这些卷积神经网络，每个运算周期都需要对数百万个参数进行读取和运算，大量参数的读取既影响网络的计算速度也带来了功耗问题。基于冯诺依曼架构的硬件由于计算单元和存储单元分离，在部署CNN模型时面临存储墙问题，数据频繁搬运消耗的时间和能量远远大于计算单元计算消耗的时间和能量。

存算一体架构的硬件相对于冯诺依曼架构的硬件，将计算单元和存储单元合并，大大减少了数据的传输，从而降低功耗和加快计算速度[4]，因此将深度卷积神经网络部署在基于存算一体架构的硬件上具有广阔的前景。目前实现存算一体化的硬件主要包括相变存储器[5](Phase Change Memory，PCM)，阻变存储器ReRAM[6]以及浮栅器件Flash，其中Flash由于制造工艺成熟，受到广泛关注。

本文详细内容请下载：http://www.chinaaet.com/resource/share/2000003598

作者信息：

陈雅倩，黄鲁

(中国科学技术大学微电子学院，安徽合肥230026)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容