数据与计算发展前沿 ›› 2023, Vol. 5 ›› Issue (5): 140-153.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.05.012

doi: 10.11871/jfdc.issn.2096-742X.2023.05.012

• 技术与应用 • 上一篇    下一篇

适用于方面级情感分析的多级数据增强方法

张蓉1,*(),刘渊2   

  1. 1.江苏信息职业技术学院,物联网工程学院, 江苏 无锡 214153
    2.江南大学,人工智能与计算机学院, 江苏 无锡 214122
  • 收稿日期:2023-03-17 出版日期:2023-10-20 发布日期:2023-10-31
  • 通讯作者: 张蓉(E-mail: 95291188@qq.com
  • 作者简介:张蓉,江苏信息职业技术学院,硕士,副教授,主要研究方向为自然语言处理与大数据分析。
    本文主要负责论文初稿撰写与MLDA开发。
    ZHANG Rong, master’s degree, is an associate professor of JiangSu Vocational College of Information Technology. Her main research interests include natural language processing and big data analysis.
    In this paper, she is responsible for the drafting of the initial manuscript and the development of MLDA.
    E-mail: 100401@jsit.edu.cn
  • 基金资助:
    国家自然科学基金资助项目“面向天地一体化信息网络的可伸缩与可重构仿真技术”(61972182);江苏省高等职业教育高水平专业群建设项目“物联网应用技术”(苏教职函〔2021〕1号);江苏省高校“青蓝工程”优秀教学团队“物联网应用技术”(苏教办师函〔2021〕23号)

Multi-Level Data Augmentation Method for Aspect-Based Sentiment Analysis

ZHANG Rong1,*(),LIU Yuan2   

  1. 1. School of Internet of Things Engineering, JiangSu Vocational College of Information Technology, WuXi, JiangSu 214153, China
    2. School of Artificial Intelligence and Computer, JiangNan University, WuXi, JiangSu 214122, China
  • Received:2023-03-17 Online:2023-10-20 Published:2023-10-31

摘要:

【目的】 方面级情感分析能够更好地洞察用户评论,是近年来研究的热点。针对方面级情感分析领域中标签数据较难获取的问题,设计简单而有效的多级数据增强方法。【方法】 在不改变情感极性的前提下,针对一个评论中特定几个目标方面进行句子级相邻词、领域级同类词和词向量级同义词替换,既保证了标签不变性,又能够生成多样化的合成训练样本。每种数据增强方法能够单独运用或者随机组合运用。【结果】 提出的方案分别运用在基于注意力机制+预训练模型和基于依赖树+预训练模型上,并应用于对比学习框架。在SemEval 2014 Task 4 Sub Task 2上进行实验,实验结果表明提出的数据增强方法是有效的,AccuracyMacro-f1指标优于基准指标。【结论】 多级数据增强方法可以有效缓解方面级情感分析任务中数据不足问题,既可以作为原训练数据的有效补充实施共同训练,也可以构建正样本用于对比学习实施多任务训练。

关键词: 方面级情感分析, 预训练模型, 数据增强, 依赖树, 注意力机制, 对比学习

Abstract:

[Objective] Aspect-level sentiment analysis provides better insights into user reviews and has become a research hotspot in recent years. This paper designs a simple and effective triple-level data augmentation method, addressing the problem that label data is difficult to obtain in the field of aspect-level sentiment analysis. [Methods] Under the premise of not changing the emotional polarity, sentence-level adjacent words, domain-level similar words, and word vector-level synonyms are replaced for specific target aspects in a comment, which not only ensures label invariance but also generates diverse Synthetic training samples. Each enhancement method in the multi-level data enhancement method can be used either individually or in random combinations. [Results] The proposed schemes are applied to the attention mechanism with the pre-trained model and the dependency tree with the pre-trained model respectively, and tested in the contrastive learning framework. The experiments are carried out on SemEval 2014 Task 4 Sub Task 2. The experimental results show that the proposed data enhancement method is effective, and the values of indicators of Accuracy and Macro-f1 are better than the baseline ones. [Conclusions] Multi-level data augmentation method can effectively alleviate the problem of insufficient data in aspect-level sentiment analysis tasks. It can be used as an effective supplement to the original training data for joint training, and can also be constructed for contrastive learning to implement multi-task training.

Key words: aspect-based sentiment analysis, pre-trained model, data augmentation, dependency parse tree, attention mechanism, contrastive learning