基于CMN网络的低资源柯尔克孜语识别研究Research on CMN-based recognition of Kirgiz with less resources
孙杰,吾守尔·斯拉木,热依曼·吐尔逊
摘要(Abstract):
少数民族语言进行语音识别时存在训练数据稀疏导致识别率低的问题。该文在对低资源的柯尔克孜语识别时,提出一种CMN网络构建跨语种声学模型。CMN网络模型利用CNN的局部采样和权值共享技术减少网络参数,并采用maxout神经元替换CNN的卷积核提高网络抽象特征提取能力。跨语种的CMN首先用资源相对丰富的维吾尔语进行预训练,为防止过拟合使用dropout正则化训练方法,并根据两种语言的相似性创建基于同义词强制对齐的音素映射集,然后标注待识别的柯尔克孜语数据,最后用有限的目标语语料对CMN网络参数进行微调。实验结果表明,所提CMN声学模型较基线CNN声学模型字错误率(WER)有8.3%的降低。
关键词(KeyWords): 语音识别;低资源;柯尔克孜语;跨语种声学模型;CMN;音素映射
基金项目(Foundation): 国家重点基础研究发展计划(“973”计划(2014CB340506));; 国家自然科学基金项目(61433012);国家自然科学基金项目(61363063);; 新疆维吾尔自治区重点实验室项目(2015KL013)~~
作者(Author): 孙杰,吾守尔·斯拉木,热依曼·吐尔逊
DOI: 10.16652/j.issn.1004-373x.2018.24.033
参考文献(References):
- [1] SCHULTZ T, WAIBEL A. Experiments on cross-languageacoustic modeling[C/OL].[2011-09-14]. https://wenku.baidu.com/view/b6ced1edf8c75fbfc77db2d3.html.
- [2] IMSENG D,BOURLARD H,GARNER P N. Using KL-diver-gence and multilingual information to improve ASR for under-resourced languages[C]//Proceedings of IEEE InternationalConference on Acoustics,Speech and Signal Processing. Kyo-to:IEEE,2012:4869-4872.
- [3] RABINER L R. A tutorial on hidden Markov models and select-ed applications in speech recognition[J]. Proceedings of theIEEE,1989,77(2):257-286.
- [4] MIAO Y,METZE F,WAIBEL A. Subspace mixture model forlow-resource speech recognition in cross-lingual settings[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing. Vancouver:IEEE,2013:7339-7343.
- [5] JOY N M,ABRAHAM B,NAVNEETH K,et al. Improvedacoustic modeling of low-resource languages using sharedSGMM parameters of high-resource languages[C]//Proceedings of 22nd National Conference on Communication. Guwahati:IEEE,2016:1-6.
- [6] DAHL G E,DONG Y,LI D,et al. Context-dependent pretrained deep neural networks for large-vocabulary speech recognition[J]. IEEE transactions on audio speech&language processing,2012,20(1):30-42.
- [7] HUANG J T,LI J,YU D,et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver:IEEE,2013:7304-7308.
- [8] GOODFELLOW I J,WARDE-FARLEY D,MIRZA M,et al.Maxout networks[C]//Proceedings of the 30th InternationalConference on Machine Learning.[S. l.:s. n.],2013:1319-1327.
- [9] MIAO Y,METZE F,RAWAT S. Deep maxout networks for low-resource speech recognition[C]//Proceedings of IEEE Work-shop on Automatic Speech Recognition and Understanding. Olo-mouc:IEEE,2014:398-403.
- [10] CAI M,SHI Y,LIU J. Stochastic pooling maxout networksfor low-resource speech recognition[C]//Proceedings of IEEE International Conference on Acoustics,Speech and Signal Processing. Florence:IEEE,2014:3266-3270.
- [11] DAHL J V,KOCH K C,KLEINHANS E,et al. Convolutional networks and applications in vision[C]//Proceedings of IEEE International Symposium on Circuits and Systems. Paris:IEEE,2010:253-256.
- [12]吴宏伟.突厥语族语言的分类[J].语言与翻译,1992(1):19-24.WU Hongwei. The language classification of Turkic branch[J]. Language and translation,1992(1):19-24.
- [13]那斯尔江·吐尔逊,吾守尔·斯拉木.基于隐马尔可夫模型的维吾尔语连续语音识别系统[J].计算机应用,2009,29(7):2009-2011.Nasirjan Tursun,Wushour Silamu. Uyghur continuous speech recognition system based on HMM[J]. Journal of computer applications,2009,29(7):2009-2011.
- [14]杨雅婷,马博,王磊,等.维吾尔语语音识别中发音变异现象[J].清华大学学报(自然科学版),2011,51(9):1230-1233.YANG Yating,MA Bo,WANG Lei,et al. Uyghur pronunciation variations in automatic speech recognition systems[J].Journal of Tsinghua University(Science&technology),2011,51(9):1230-1233.
- [15]艾斯卡尔·肉孜,殷实,张之勇,等.THUYG-20:免费的维吾尔语语音数据库[J].清华大学学报(自然科学版),2017,57(2):182-187.Aisikaer Rouzi,YIN Shi,ZHANG Zhiyong,et al. THUYG-20:a free Uyghur speech database[J]. Journal of Tsinghua University(Science&technology),2017,57(2):182-187.