朱连淼,杨波,郭佳君,陈晓燚.基于全局注意力机制的汉语手语词翻译[J].中南民族大学学报自然科学版,2022,41(4):499-505
基于全局注意力机制的汉语手语词翻译
Chinese sign language word translation based on global attention mechanism
  
DOI:10.12130/znmdzk.20220417
中文关键词: 手语翻译  全局注意力机制  长短时记忆网络
英文关键词: sign language translation  global attention mechanism  LSTM network
基金项目:国家自然科学基金资助项目(61976226);中南民族大学研究生学术创新基金资助项目(3212020sycxjj137)
作者单位
朱连淼 中南民族大学 计算机科学学院武汉 430074 
杨波 中南民族大学 计算机科学学院武汉 430074 
郭佳君 中南民族大学 计算机科学学院武汉 430074 
陈晓燚 中南民族大学 计算机科学学院武汉 430074 
摘要点击次数: 95
全文下载次数: 71
中文摘要:
      针对使用卷积神经网络结构结合循环神经网络结构的模型在手语翻译任务中难以关注到手语视频序列中关键帧的问题,提出了一种结合全局注意力机制的手语翻译模型.该模型在长短时记忆网络中嵌入全局注意力机制,通过计算当前隐藏状态和源隐藏状态之间的相似度并得出对齐向量,让模型学习对齐权重,使模型关注到长手语视频序列中的关键帧,从而提升模型翻译的准确率.实验结果表明:加入全局注意力机制的模型在DEVISIGN_D数据集上的准确率优于3DCNN、CNN+LSTM等主流模型,并且在100分类的短手语词和长手语词数据集上,分别与未使用注意力机制的模型进行了对比,其准确率提升0.87%和1.60%,证明该注意力机制可以有效地提升模型翻译的准确率.
英文摘要:
      Convolutional neural network combined with recurrent neural network is difficult to focus on the key frames in the video sequence of sign language translation. A sign language translation model which combined with global attention mechanism is proposed. In this model, the global attention mechanism is embedded in the Long-Short Term Memory(LSTM). The alignment vector is acquired by calculating the similarity between the current hidden state and the source hidden state. The model learns the alignment weight and pays attention to the key frames in the long sign language video sequence, thus improving the accuracy of the model translation.Experiment results show that the accuracy of the model with global attention mechanism is better than that of 3DCNN, CNN+LSTM and other mainstream models in DEVISIGN_D data set. Compared with the model without the attention mechanism, our model are improved by 0.87% and 1.60% in the data set of 100 categories of short-handed words and long-handed words respectively. These validate that the attention mechanism can effectively improve the accuracy of model translation.
查看全文   查看/发表评论  下载PDF阅读器
关闭