基于伪触发词的并行预测篇章级事件抽取方法
电子技术应用
秦海涛1,2,线岩团1,2,相艳1,2,黄于欣1,2
1.昆明理工大学 信息工程与自动化学院; 2.昆明理工大学 云南省人工智能重点实验室
摘要: 篇章级事件抽取一般将事件抽取任务分为候选实体识别、事件检测和论元识别3个子任务,然后采用级联的方式依次进行,这样的方式会造成误差传递;另外,现有的大多数模型在解码事件时,对事件数量的预测隐含在解码过程中,且只能按照预定义的事件顺序及预定义的角色顺序预测事件论元,使得先抽取的事件并没有考虑到后面抽取的事件。针对以上问题提出一种多任务联合的并行预测事件抽取框架。首先,使用预训练语言模型作为文档句子的编码器,检测文档中存在的事件类型,并使用结构化自注意力机制获取伪触发词特征,预测每种事件类型的事件数量;然后将伪触发词特征与候选论元特征进行交互,并行预测每个事件对应的事件论元,在大幅缩减模型训练时间的同时获得与基线模型相比更好的性能。最终事件抽取结果F1值为78%,事件类型检测子任务F1值为98.7%,事件数量预测子任务F1值为90.1%,实体识别子任务F1值为90.3%。
中图分类号:TP391 文献标志码:A DOI: 10.16157/j.issn.0258-7998.244868
中文引用格式: 秦海涛,线岩团,相艳,等. 基于伪触发词的并行预测篇章级事件抽取方法[J]. 电子技术应用,2024,50(4):67-74.
英文引用格式: Qin Haitao,Xian Yantuan,Xiang Yan,et al. Parallel prediction of document-level event extraction method via pseudo trigger words[J]. Application of Electronic Technique,2024,50(4):67-74.
中文引用格式: 秦海涛,线岩团,相艳,等. 基于伪触发词的并行预测篇章级事件抽取方法[J]. 电子技术应用,2024,50(4):67-74.
英文引用格式: Qin Haitao,Xian Yantuan,Xiang Yan,et al. Parallel prediction of document-level event extraction method via pseudo trigger words[J]. Application of Electronic Technique,2024,50(4):67-74.
Parallel prediction of document-level event extraction method via pseudo trigger words
Qin Haitao1,2,Xian Yantuan1,2,Xiang Yan1,2,Huang Yuxin1,2
1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology;2.Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology
Abstract: Document-level event extraction generally divides the task into three subtasks: candidate entity recognition, event detection, and argument recognition. The conventional approach involves sequentially performing these subtasks in a cascading manner, leading to error propagation. Additionally, most existing models implicitly predict the number of events during the decoding process and predict event arguments based on a predefined event and role order, so that the former extraction will not consider the latter extraction results. To address these issues, a multi-task joint and parallel event extraction framework is proposed in this paper. Firstly, a pre-trained language model is used as the encoder for document sentences. On this basis, the framework detects the types of events present in the document. It utilizes a structured self-attention mechanism to obtain pseudo-trigger word features and predicts the number of events for each event type. Subsequently, the pseudo-trigger word features are interacted with candidate argument features, and parallel prediction is performed to obtain various event arguments for each event, significantly reducing model training time while achieving performance comparable to the baseline model. The final F1 score for event extraction is 78%, with an F1 score of 98.7% for the event type detection subtask, 90.1% for the event quantity prediction subtask, and 90.3% for the entity recognition subtask.
Key words : document-level event extraction;multi-task joint;pre-trained language model;structured self-attention mechanism;parallel prediction
引言
近年来互联网发展迅速,网络媒体每天产生大量信息,事件抽取任务作为信息抽取的分支,能从这些非结构化文本信息中抽取结构化信息[1],帮助人们快速有效地做出分析和决策,是自然语言处理领域中一项重要的研究任务,在智能问答、信息检索、自动摘要、推荐等领域有着广泛的应用。
事件抽取从文本粒度上可以分为句子级的事件抽取[2-6]和篇章级的事件抽取[7-18],句子级事件抽取通常先识别句子中的触发词[1-2]来检测事件类型,然后再抽相应的事件论元(元素),而Li等[4]和Nguyen等[5]则采用联合模型捕获实体与事件之间的语义关系,同时识别事件和实体,提高了事件抽取的准确率。但是随着文本信息的增加,一些基于触发词的句子级事件抽取不再适用,以及由于文档信息在日常生活中更普遍的适用性,篇章级的事件抽取受到了更广泛的关注。
本文详细内容请下载:
http://www.chinaaet.com/resource/share/2000005951
作者信息:
秦海涛1,2,线岩团1,2,相艳1,2,黄于欣1,2
(1.昆明理工大学 信息工程与自动化学院,云南 昆明 650500;
2.昆明理工大学 云南省人工智能重点实验室,云南 昆明 650500)
此内容为AET网站原创,未经授权禁止转载。