基于分割的自然场景下文本检测方法与应用-AET-电子技术应用

基于分割的自然场景下文本检测方法与应用

2021年电子技术应用第2期

陈小顺，王良君

江苏大学计算机科学与通信工程学院，江苏镇江212013

摘要： 自然场景文本检测识别在智能设备中应用广泛，而对文本识别的第一步则是对文本进行精确的定位检测。对于现有像素分割方法PixelLink中存在的弯曲文本定位包含过多背景信息、检测图像后处理不足两个主要问题提出改进。引入特征通道注意力机制，关注生成特征图中特征通道间的权重关系，提升检测方法的鲁棒性。接着改变公开数据集标注形式，将坐标点表示为一串带有方向的序列形式，在LSTM模型中进行多边形框的学习与框定。最后在公开数据集和自建数据集上进行文本检测测试。实验表明，改进的检测方法在各数据集中表现优于原方法，与当前领先方法精度相近，能够在各个环境中完成对文本的检测功能。

关键词： 像素分割注意力机制 LSTM 自然场景文本检测

中图分类号： TN911.73；TP391.4
文献标识码： A
DOI：10.16157/j.issn.0258-7998.200316
中文引用格式： 陈小顺，王良君. 基于分割的自然场景下文本检测方法与应用[J].电子技术应用，2021，47(2)：54-57.
英文引用格式： Chen Xiaoshun，Wang Liangjun. Text detection and application in natural scene based on segmentation[J]. Application of Electronic Technique，2021，47(2)：54-57.

Text detection and application in natural scene based on segmentation

Chen Xiaoshun，Wang Liangjun

School of Computer Science and Telecommunication Engineering, Jiangsu University，Zhenjiang 212013，China

Abstract： Text recognition in nature scene is currently applied in various intelligence equipment. The first step of text recognition is to precisely locate the text. In the Pixel Link text location methods, there are mainly two problems: too much background information is incorporated in the text region, and the test accuracy is insufficient. Aiming at these issues, an improved text location method was proposed to precisely locate the text in the natural scene. At first, an attention mechanism was incorporated into the original network. By focusing on the weight relationship between feature channels in the generated feature map, one can improve the weight coefficient of effective feature channels, and suppress the weight of inefficient or invalid feature channels. In the second, by changing the form of data set annotation, the coordinate points can be expressed as a series of sequence forms, so that the text lines can be framed adaptively in the LSTM model. At last, the located object is rotated according to the angle between a pair of vertexes in the polygon frame, and is subsequently fed to the text recognition interface to obtain the final character. Finally, the text detection test is carried out on the open data set and self-built data set. The experimental results show that the improved detection method is superior to the original method on different dataset, and the accuracy is similar to the current leading method.

Key words : pixel segmentation；attention mechanism；LSTM；natural scene text detection

0 引言

视觉图像是人们获取外界信息的主要来源，文本则是对事物的一种凝练描述，人通过眼睛捕获文本获取信息，机器设备的眼睛则是冰冷的摄像头。如何让机器设备从拍照获取的图像中准确检测识别文本信息逐渐为各界学者关注。

现代文本检测方法多为基于深度学习的方法，主要分为基于候选框和基于像素分割的两种形式。本文选择基于像素分割的深度学习模型作为文本检测识别的主要研究方向，能够同时满足对自然场景文本的精确检测，又能保证后续设备功能(如语义分析等功能)的拓展。

本文详细内容请下载:http://www.chinaaet.com/resource/share/2000003385

作者信息:

陈小顺，王良君

(江苏大学计算机科学与通信工程学院，江苏镇江212013)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容