| 114 | 25 | 0 |
阅读 |
下载 |
被引 |
为同时捕捉不同尺度的特征,精准区分前景手势和背景干扰,文中提出基于多尺度卷积神经网络的连续手语精准识别方法,旨在解决手势多样性带来的识别难题。利用主导手轨迹信息的手语语句分割算法,检测连续手语视频中的过渡动作,分割连续手语视频,得到多个复合视频段;多尺度卷积神经网络通过大小不同的卷积核,同时捕捉每个复合视频段不同尺度的特征,精准区分前景手势和背景干扰;利用多尺度空洞卷积池化金字塔模块融合各复合视频段的多尺度特征,充分利用手语动作的多尺度信息,增强网络对手势多样性的处理能力;采用Softmax分类器处理融合多尺度特征,得到各复合视频段的手语精准识别结果;按照时间先后顺序串联识别结果,得到最终的识别结果。实验结果证明,所提方法可精准识别连续手语,且在不同背景干扰情况下的连续手语识别的决定系数与1较为接近,即连续手语识别精度较高,可以有效解决连续手语识别中的难点。
Abstract:To capture features of different scales simultaneously and distinguish foreground gestures from background interference accurately, a continuous sign language accurate recognition method based on multi ⁃ scale convolutional neural networks is studied, aiming to solve the recognition difficulties caused by gesture diversity. A sign language sentence segmentation algorithm utilizing dominant hand trajectory information is used to detect transitional actions in continuous sign language videos, segment continuous sign language videos, and obtain multiple composite video segments. Multi ⁃ scale convolutional neural networks are used to capture features of different scales in each composite video segment by convolution kernels of different sizes, so as to distinguish foreground gestures from background interference accurately. A multi⁃scale dilated convolution pooling pyramid module is used to fuse the multi⁃scale features of each composite video segment, and the multi⁃scale information of sign language actions are fully utilized to enhance the network's ability to handle gesture diversity. A Softmax classifier is used to process and fuse multi⁃scale features, and accurate sign language recognition results for each composite video segment are obtained. The recognition results are concatenated in chronological order to obtain the final recognition results. Experimental results have shown that the method can recognize continuous sign language accurately, and its determination coefficient of continuous sign language recognition under different background interference conditions is close to 1, indicating high accuracy in continuous sign language recognition. To sum up, the proposed method can effectively solve the difficulties in continuous sign language recognition.
[1] 刘群坡,盛月琴,高如新,等.基于关键帧和注意力残差网络的手语识别[J].计算机工程,2023,49(12):224⁃230.
[2] 路飞,韩祥祖,程显鹏,等.基于轻量3DCNNs和Transformer的手语识别[J].华中科技大学学报(自然科学版),2023,51(5):13⁃18.
[3] 肖正业,林世铨,万修安,等.基于时序信息对齐的连续手语跨模态知识蒸馏[J].计算机科学,2022,49(11):156⁃162.
[4] 杨观赐,韩海峰,刘赛赛,等.基于全局注意力机制和LSTM的连续手语识别算法[J].包装工程,2022,43(8):28⁃34.
[5] 杨光义,丁星宇,高毅,等.基于注意力机制的复杂背景连续手语识别[J].武汉大学学报(理学版),2023,69(1):97⁃105.
[6] SALAR B, ALIREZA T, MINOO M A. Dynamic Iranian sign language recognition using an optimized deep neural network:an implementation via a robotic ⁃ based architecture [J].International journal of social robotics, 2023, 15(4): 599⁃619.
[7] 王帅,张淑军,叶康,等.基于改进Transformer的连续手语识别方法[J].计算机科学,2022,49(z2):573⁃578.
[8] 吕军,强彦.基于无线传感技术与卷积神经网络的静态手语识别方法[J].传感技术学报,2023,36(4):623⁃628.
[9] 云涛,潘泉,郝宇航,等.基于HRRP时频特征和多尺度非对称卷积神经网络的目标识别算法[J].西北工业大学学报,2023,41(3):537⁃545.
[10] 杨淑莹,田迪,郭杨杨,等.仿真手语翻译系统开发[J].计算机仿真,2022,39(2):278⁃282.
[11] 冯一飞,王青山.长短时记忆脉冲神经网络手语识别模型[J].合肥工业大学学报(自然科学版),2023,46(11):1479⁃1483.
[12] 闵越聪,陈熙霖.面向连续手语识别的自适应关键帧选择[J].中国科学(信息科学),2024,54(4):893⁃910.
[13] 邢晋超,潘广贞.改进YOLOv5s的手语识别算法研究[J].计算机工程与应用,2022,58(16):194⁃203.
基本信息:
DOI:10.16652/j.issn.1004⁃373x.2026.03.004
引用信息:
[1]陈昊飞,狄长安.基于多尺度卷积神经网络的连续手语精准识别研究[J],2026,49(3):19⁃22.DOI:10.16652/j.issn.1004⁃373x.2026.03.004.
2024-12-13
2024
2025-02-13
2025
2
阅读
下载
被引