适用于方面级情感分析的多级数据增强方法

doi:10.11871/jfdc.issn.2096-742X.2023.05.012

数据与计算发展前沿 ›› 2023, Vol. 5 ›› Issue (5): 140-153.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.05.012

doi: 10.11871/jfdc.issn.2096-742X.2023.05.012

适用于方面级情感分析的多级数据增强方法

张蓉^1,^*(),刘渊²

1.江苏信息职业技术学院，物联网工程学院，江苏无锡 214153
2.江南大学，人工智能与计算机学院，江苏无锡 214122

收稿日期:2023-03-17 出版日期:2023-10-20 发布日期:2023-10-31
通讯作者: *张蓉（E-mail: 95291188@qq.com）
作者简介:张蓉，江苏信息职业技术学院，硕士，副教授，主要研究方向为自然语言处理与大数据分析。
本文主要负责论文初稿撰写与MLDA开发。
ZHANG Rong, master’s degree, is an associate professor of JiangSu Vocational College of Information Technology. Her main research interests include natural language processing and big data analysis.
In this paper, she is responsible for the drafting of the initial manuscript and the development of MLDA.
E-mail: 100401@jsit.edu.cn
基金资助:
国家自然科学基金资助项目“面向天地一体化信息网络的可伸缩与可重构仿真技术”(61972182);江苏省高等职业教育高水平专业群建设项目“物联网应用技术”(苏教职函〔2021〕1号);江苏省高校“青蓝工程”优秀教学团队“物联网应用技术”(苏教办师函〔2021〕23号)

Multi-Level Data Augmentation Method for Aspect-Based Sentiment Analysis

ZHANG Rong^1,^*(),LIU Yuan²

1. School of Internet of Things Engineering, JiangSu Vocational College of Information Technology, WuXi, JiangSu 214153, China
2. School of Artificial Intelligence and Computer, JiangNan University, WuXi, JiangSu 214122, China

Received:2023-03-17 Online:2023-10-20 Published:2023-10-31

摘要/Abstract

摘要：

【目的】 方面级情感分析能够更好地洞察用户评论，是近年来研究的热点。针对方面级情感分析领域中标签数据较难获取的问题，设计简单而有效的多级数据增强方法。【方法】 在不改变情感极性的前提下，针对一个评论中特定几个目标方面进行句子级相邻词、领域级同类词和词向量级同义词替换，既保证了标签不变性，又能够生成多样化的合成训练样本。每种数据增强方法能够单独运用或者随机组合运用。【结果】 提出的方案分别运用在基于注意力机制+预训练模型和基于依赖树+预训练模型上，并应用于对比学习框架。在SemEval 2014 Task 4 Sub Task 2上进行实验，实验结果表明提出的数据增强方法是有效的，Accuracy和Macro-f1指标优于基准指标。【结论】 多级数据增强方法可以有效缓解方面级情感分析任务中数据不足问题，既可以作为原训练数据的有效补充实施共同训练，也可以构建正样本用于对比学习实施多任务训练。

关键词: 方面级情感分析, 预训练模型, 数据增强, 依赖树, 注意力机制, 对比学习

Abstract:

[Objective] Aspect-level sentiment analysis provides better insights into user reviews and has become a research hotspot in recent years. This paper designs a simple and effective triple-level data augmentation method, addressing the problem that label data is difficult to obtain in the field of aspect-level sentiment analysis. [Methods] Under the premise of not changing the emotional polarity, sentence-level adjacent words, domain-level similar words, and word vector-level synonyms are replaced for specific target aspects in a comment, which not only ensures label invariance but also generates diverse Synthetic training samples. Each enhancement method in the multi-level data enhancement method can be used either individually or in random combinations. [Results] The proposed schemes are applied to the attention mechanism with the pre-trained model and the dependency tree with the pre-trained model respectively, and tested in the contrastive learning framework. The experiments are carried out on SemEval 2014 Task 4 Sub Task 2. The experimental results show that the proposed data enhancement method is effective, and the values of indicators of Accuracy and Macro-f1 are better than the baseline ones. [Conclusions] Multi-level data augmentation method can effectively alleviate the problem of insufficient data in aspect-level sentiment analysis tasks. It can be used as an effective supplement to the original training data for joint training, and can also be constructed for contrastive learning to implement multi-task training.

Key words: aspect-based sentiment analysis, pre-trained model, data augmentation, dependency parse tree, attention mechanism, contrastive learning

张蓉, 刘渊. 适用于方面级情感分析的多级数据增强方法[J]. 数据与计算发展前沿, 2023, 5(5): 140-153.

ZHANG Rong, LIU Yuan. Multi-Level Data Augmentation Method for Aspect-Based Sentiment Analysis[J]. Frontiers of Data and Computing, 2023, 5(5): 140-153, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2023.05.012.

图/表 14

表1

图1

图2

图3

图4

表2

图5

图6

图7

表3

表4

图8

图9

表5

参考文献 30

[1]	THET T T, NA J C, KHOO C S. Aspect-based sentiment analysis of movie reviews on discussion boards[J]. Journal of Information Science, 2010, 36(6): 823-848. doi: 10.1177/0165551510388123
[2]	DEVLIN J, CHANG MINGWEI, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]// Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Language Technologies, 2019: 4171-4186.
[3]	WANG A, SINGH A, MICHAEL J, et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding[C]// Proc of the EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018: 353-355.
[4]	XIE Q Z, DAI Z H, EDUARD H, et al. Unsupervised data augmentation for consistency training[J]. Advances in Neural Information Processing Systems, 2020, 33: 6256-6268.
[5]	张严, 李天瑞. 面向评论的方面级情感分析综述[J]. 计算机科学, 2020, 47(6): 200-206.
[6]	PONTIKI M, GALANIS D, PAPAGEORGIOU H, et al. Semeval-2014 task 4: Aspect based sentiment analysis[C]// International Workshop on Semantic Evaluation, 2014: 19-30.
[7]	XU H, LIU B, SHU L, et al. BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis[C]// Proc of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019: 2324-2335.
[8]	ZHANG X, ZHAO J B, YAN L C. Character-level convolutional networks for text classification[J]. Advances in neural information processing systems, 2015, 28: 649-657.
[9]	ZHANG Y, CHEN G G, YU D, et al. Highway long short-term memory rnns for distant speech recognition[C]// IEEE International Conference on Acoustics.IEEE, 2016: 5755-5759.
[10]	LONGPRE S, WANG YU, DUBOIS C. How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?[C]// Findings of the Association for Computational Linguistics: EMNLP 2020, 2020: 4401-4411.
[11]	WEI J, ZOU K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks[C]// Proc of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019:6382-6388.
[12]	KANG D, KHOT T, SABHARWAL A, et al. AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples[C]// Proc of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers), 2018: 2418-2428.
[13]	WANG K, SHEN W Z, YANG Y Y, et al. Relational Graph Attention Network for Aspect-based Sentiment Analysis[C]// Proc of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3229-3238.
[14]	WANG Y Q, HUANG M L, ZHAO L, et al. Attention-based lstm for aspect-level sentiment classification[C]// Proc of the conference on empirical methods in natural language processing, 2016: 606-615.
[15]	SONG Y W, WANG J H, JIANG T, et al. Attentional Encoder Network for Targeted Sentiment Classification[J]. arXive-prints, 2019: 1902.09314.
[16]	HUANG B X, KATHLEEN M C. Syntax-Aware Aspect Level Sentiment Classifification with Graph Attention Networks[C]// Proc of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 5472-5480.
[17]	PENNINGTON J, SOCHER R, MANNING C D. Glove: Global vectors for word representation[C]// Proc of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1532-1543.
[18]	SHORTEN C, KHOSHGOFTAAR T M. A survey on image data augmentation for deep learning[J]. Journal of big data, 2019, 6(1): 1-48. doi: 10.1186/s40537-018-0162-3
[19]	LI B H, HOU Y T, CHE W X. Data augmentation approaches in natural language processing: A survey[J]. AI Open, 2022, 3:71-90. doi: 10.1016/j.aiopen.2022.03.001
[20]	MILLER G A. WordNet: a lexical database for English[J]. Communications of the ACM, 1995, 38(11): 39-41.
[21]	SUN L C, XIA C Y, YIN W p, et al. Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks[C]// Proc of the 28th International Conference on Computational Linguistics, 2020: 3436-3440.
[22]	TOMAS M, KAI C, GREG C, et al. Efficient estimation of word representations in vector space[J/OL]. CoRR, 2013: abs/1301.3781.
[23]	LIU S S, LEE K, LEE I. Document-level multi-topic sentiment classification of Email data with BiLSTM and data augmentation[J]. Knowledge-Based Systems, 2020: 197(4):489-499.
[24]	WANG Y W, YANG D Y. That’s so annoying!!!: A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using petpeeve tweets[C]// Proc of the Conference on empirical methods in natural language processing, 2015: 2557-2563.
[25]	ZHANG W, DENG Y, Li X, et al. Aspect Sentiment Quad Prediction as Paraphrase Generation[C]// Proc of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021: 9209-9219.
[26]	TANG H, JI D H, LI C L, et al. Dependency graph enhanced dual-transformer structure for aspect-based sentiment classification[C]// Proc of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 6578-6588.
[27]	YANG H, LI K. Improving Implicit Sentiment Learning via Local Sentiment Aggregation[J]. arXiv preprint arXiv:2110.08604, 2021.
[28]	KAWIN E. How contextual are context tualized word representations? comparing the geometry of bert, elmo, and GPT-2 embeddings[C]// Proc of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019: 55-65.
[29]	CHEN T, SIMON K, MOHAMMAD N, et al. A simple framework for contrastive learning of visual representations[J]. Proceedings of Machine Learning Research(ICML), 2020: 1597-1607.
[30]	YAN Y M, LI R M, WANG S R, et al. Consert: A contrastive framework for self-supervised sentence representation transfer[J]. ACL/IJCNLP, 2021:5065-5075.

评论	类型
All the money went into the interior decoration, none of it went to the chefs.	原始句
All the money went into the interior decoration, none of it went to the cooks.	释义
All the money cost went into the interior decoration, none of it went to the chefs.	噪声
All the cost was spent on the interior decoration, and no money was given to the culinary.	抽样

评论	类型
Drivers updated ok but the BIOS update froze the system up and the computer shut down.	原始句
system updated ok but the BIOS update froze the Drivers up and the computer shut down.	SL-NCR
switch updated ok but the BIOS update froze the power up and the computer shut down.	DL-SWR
racers updated ok but the BIOS update froze the software up and the computer shut down.	WVL-SWR

	Rest	Laptop
训练集
#句子	2000	3045
#方面	1743	2358
#积极	2164	994
#消极	807	870
#中性	637	464
# 含有方面的句子 %含有方面的句子	1978 75%	1462 47.75%
测试集
#句子	676	800
#方面	622	654
#积极	728	341
#消极	196	128
#中性	196	169
# 含有方面的句子 %含有方面的句子	600 88.8%	411 51.4%

模型	Rest		Laptop
模型	Accuracy	Macro-f1	Accuracy	Macro-f1
BERT-BASE	82.74	73.73	79.73	75.5
TD-GAT-BERT	82.80	—	80.10	—
DGEDT-BERT	86.30	80.00	79.80	75.60
LSA-BERT	87.14	81.04	81.35	78.35
AEN-BERT	83.76	79.93	79.93	76.31
RGAT-BERT	86.60	81.35	78.21	74.07
RGAT-BERT+SL-NCR	87.52	83.56	80.09	76.33
RGAT-BERT+DL-SWR	87.45	83.23	80.31	76.69
RGAT-BERT+WVL-SWR	87.40	83.05	79.93	76.12
RGAT-BERT+MLDA	87.87	82.75	81.54	77.87

模型	Rest		Laptop
模型	Accu-racy	Macro-f1	Accur-acy	Macro-f1
AEN-BERT+SL-NCR	83.89	74.52	80.46	77.09
AEN-BERT+DL-SWR	84.8	75.65	80.09	76.67
AEN-BERT+WVL-SWR	84.73	75.54	79.96	76.64
AEN-BERT+MLDA	84.93	75.84	81.06	77.53
AEN-BERT+CL	86.37	81.10	81.42	77.96

适用于方面级情感分析的多级数据增强方法

Multi-Level Data Augmentation Method for Aspect-Based Sentiment Analysis

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 30

相关文章 5

编辑推荐

Metrics

本文评价

[1]	张晓帆, 孙海春, 李欣. 融合多层注意力机制与BiLSTM的知识图谱补全算法研究[J]. 数据与计算发展前沿, 2023, 5(3): 123-137.
[2]	童昭,王露笛,朱小杰,杜一. 基于预训练模型的军事领域命名实体识别研究[J]. 数据与计算发展前沿, 2022, 4(5): 120-128.
[3]	刘琦玮,李俊,顾蓓蓓,赵泽方. TSAIE：图像增强文本的多模态情感分析模型[J]. 数据与计算发展前沿, 2022, 4(3): 131-140.
[4]	肖楠,周明珠,邢军,罗泽,李晓辉. 基于高分辨率网络和注意力机制的真伪卷烟包装鉴别[J]. 数据与计算发展前沿, 2021, 3(5): 118-129.
[5]	冷佳旭,刘莹. 基于深度学习的小目标检测与识别[J]. 数据与计算发展前沿, 2020, 2(2): 120-135.