(五邑大学 智能制造学部,广东 江门 529020)
摘要: 场景图生成(SGG)任务旨在检测图像中的视觉关系三元组,即主语、谓语、宾语,为场景理解提供结构视觉布局。然而,现有的场景图生成方法忽略了预测的谓词频率高但却无信息性的问题,从而阻碍了该领域进步。为了解决上述问题,提出一种基于增强语义信息理解的场景图生成算法。整个模型由特征提取模块、图像裁剪模块、语义转化模块、拓展信息谓词模块四部分组成。特征提取模块和图像裁剪模块负责提取视觉特征并使其具有全局性和多样性。语义转化模块负责将谓词之间的语义关系从常见的预测中恢复信息预测。拓展信息谓词模块负责扩展信息谓词的采样空间。在数据集VG和VG-MSDN上与其他方法进行比较,平均召回率分别达到59.5%和40.9%。该算法可改善预测出来的谓词信息性不足问题,进而提升场景图生成算法的性能。
DOI: 10.16157/j.issn.0258-7998.223276
中文引用格式: 曾军英,陈运雄,秦传波,等. 基于增强语义信息理解的场景图生成[J]. 电子技术应用,2023,49(5):52-56.
英文引用格式: Zeng Junying,Chen Yunxiong,Qin Chuanbo,et al. Scene graph generation based on enhanced semantic information understanding[J]. Application of Electronic Technique,2023,49(5):52-56.
DOI: 10.16157/j.issn.0258-7998.223276
中文引用格式: 曾军英,陈运雄,秦传波,等. 基于增强语义信息理解的场景图生成[J]. 电子技术应用,2023,49(5):52-56.
英文引用格式: Zeng Junying,Chen Yunxiong,Qin Chuanbo,et al. Scene graph generation based on enhanced semantic information understanding[J]. Application of Electronic Technique,2023,49(5):52-56.
Scene graph generation based on enhanced semantic information understanding
Zeng Junying,Chen Yunxiong,Qin Chuanbo,Chen Yucong,Wang Yingbo,Tian Huiming,Gu Yajin
(Department of Intelligent Manufacturing, Wuyi University, Jiangmen 529020,China)
Abstract: The Scene Graph Generation (SGG) task aims to detect visual relation triples in images, i.e. subject, predicate and object, to provide a structural visual layout for scene understanding. However, existing approaches to scene graph generation ignore the high frequency but uninformative problem of predicted predicates, hindering progress in this field. In order to solve the above problems, this paper proposes a scene graph generation algorithm based on enhanced semantic information understanding. The whole model consists of four parts: feature extraction module, image cropping module, semantic transformation module and extended information predicate module. Feature extraction module and image cropping module are responsible for extracting visual features and making them global and diverse. The semantic transformation module is responsible for restoring the semantic relationship between predicates from common predictions to informative predictions. The extended information predicate module is responsible for extending the sampling space of the information predicate. Comparing with other methods on datasets VG and VG-MSDN, the average recall reaches 59.5% and 40.9%, respectively. The algorithm in this paper can improve the problem of insufficient information of the predicted predicate, and then improve the performance of the scene graph generation algorithm.
Key words : scene graph generation;image cropping;semantic transformation;extended information
0 引言
场景图生成 (SGG) 任务的目标是从给定图像生成图结构表示,以抽象出对象(以边界框为基础)及其成对关系。场景图旨在促进对图像中复杂场景的理解,并具有广泛的下游应用潜力,例如图像检索、视觉推理、视觉问答(VQA)、图像字幕、结构化图像生成和外绘和机器人技术。好的场景图可以在感兴趣的实例之间提供信息丰富的关系。现有的场景图生成大多遵循通用的范式,即从图像中检测目标,提取区域特征,然后在标准分类目标函数的指导下识别谓词类别。但是,这种范式有几方面的缺点。
(五邑大学 智能制造学部,广东 江门 529020)