《电子技术应用》
您所在的位置:首页 > 其他 > 设计应用 > 基于Transformer和语义增强的人群计数算法
基于Transformer和语义增强的人群计数算法
网络安全与数据治理 2023年第5期
何晴,杨倩倩,彭思凡,殷保群
(中国科学技术大学信息科学技术学院,安徽合肥230027)
摘要: 针对人群图像中的尺度变化问题,提出了基于Transformer和语义增强的人群计数算法。为了能有效应对尺度变化问题,首先引入Transformer作为主干网对全局上下文进行建模来获得全局感受野。然后由上至下依次融合主干网相邻层次的特征图,在融合过程中强化多个层次特征图的语义信息。接着对多层次特征图进行动态特征选择,选择出适合密度图生成的特征。最后,通过注意力图来调整密度图抵抗背景干扰,以此来生成高质量的人群密度估计图。在ShanghaiTech、UCFQNRF和JHUCROWD++三个数据集上进行了大量的实验来对算法的有效性进行验证,实验结果表明所提算法能有效提高模型的准确性和鲁棒性。
中图分类号:TP391.1
文献标识码:A
DOI:10.19358/j.issn.2097-1788.2023.05.009
引用格式:何晴,杨倩倩,彭思凡,等.基于Transformer和语义增强的人群计数算法[J].网络安全与数据治理,2023,42(5):50-58.
Transformer and semantic enhancement for crowd counting
He Qing,Yang Qianqian,Peng Sifan,Yin Baoqun
(School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China)
Abstract: Aiming at the problem of scale variation in crowd images, this paper proposes a crowd counting algorithm based on Transformer and semantic enhancement. Firstly, Transformer is introduced as the backbone of the network. Because it can model the global context and obtain the global receptive field, which can effectively deal with the scale variation. Then, the feature maps of adjacent levels of the backbone network are fused from top to bottom in turn, and the semantic information of multiple levels of feature maps is strengthened in the fusion process. Afterwards the dynamic feature selection of multilevel feature maps is carried out, and the features suitable for density map generation are selected. Finally, the density map is adjusted to resist background interference by attention masks, so as to generate highquality crowd density estimation map. In this paper, a large number of experiments are carried out on ShanghaiTech, UCF_QNRF and JHUCROWD++ datasets to verify the effectiveness of the algorithm. The experimental results show that the proposed algorithm can effectively improve the accuracy and robustness of the model.
Key words : crowd counting; Transformer; semantic enhancement; feature selection

0    引言

人群计数在视频监控、人群分析和公共安全领域发挥着重要作用,考虑到大规模的人群聚集事件的频繁发生,对拥挤场景的人群分析十分必要。然而现阶段人群计数的应用还受到很大的限制,在诸多限制中,图像中人头尺寸不一致的问题尤其受到大多数研究者的关注。由于摄像头高度和角度受到限制,所拍摄的图像存在透视失真,从而导致了图像中目标尺度差异较大。如图1所示,离摄像头远处的目标尺度较大,近处的目标尺度较小。为了解决尺度变化问题,本文提出基于Transformer语义增强的人群计数算法,利用Transformer获取全局感受野,由上至下依次融合相邻层次特征并对语义信息进行增强,动态选择适合密度图生成的特征,从而生成高质量的人群密度估图。



本文详细内容请下载:https://www.chinaaet.com/resource/share/2000005334




作者信息:

何晴,杨倩倩,彭思凡,殷保群

(中国科学技术大学信息科学技术学院,安徽合肥230027)


微信图片_20210517164139.jpg

此内容为AET网站原创,未经授权禁止转载。