普通话双模态情感语音数据库的设计与评价Design and evaluation of Mandarin bi-modal emotion speech database
任国凤,张雪英,李东,闫建政
摘要(Abstract):
针对包含发音动作参数和情感语音的双模态汉语普通话数据库非常匮乏的问题,设计包含中性、高兴、愤怒及悲伤4种情感的普通话语音库。该语音库由10名被试录制的1 440段音频及发音动作数据组成,文本长度有双音节词和句子两种类型。为了确保该数据库的有效性,邀请普通话较好、听力正常的10名评价者组成评价小组,对数据库内所有音频文件进行评价。根据评价小组评价结果结合发音动作数据的稳定性进行筛选,得到语音质量较好、发音动作参数稳定的双模态情感语音数据库。该数据库可用于开展情感语音的发音动作研究,进而单独或联合作为情感语音识别算法的样本数据,对情感语音识别率的提高具有积极的作用。
关键词(KeyWords): 数据库;情感语音;发音动作参数;汉语普通话;信号处理;普通话语音库
基金项目(Foundation): 国家自然科学基金(61371193);; 山西省研究生创新基金(2015BY24);; 山西省教育改革创新项目(J2016097)~~
作者(Author): 任国凤,张雪英,李东,闫建政
DOI: 10.16652/j.issn.1004-373x.2018.14.045
参考文献(References):
- [1]韩文静,李海峰,阮华斌,等.语音情感识别研究进展综述[J].软件学报,2014,25(1):37-50.HAN Wenjing,LI Haifeng,RUAN Huabin,et al.Review on speech emotion recognition[J].Journal of software,2014,25(1):37-50.
- [2]张雪英,孙颖,张卫,等.语音情感识别的关键技术[J].太原理工大学学报,2015,46(6):629-636.ZHANG Xueying,SUN Ying,ZHANG Wei,et al.Key technologies in speech emotion recognition[J].Journal of Taiyuan University of Technology,2015,46(6):629-636.
- [3]赵国朕,宋金晶,葛燕,等.基于生理大数据的情绪识别研究进展[J].计算机研究与发展,2016,53(1):80-92.ZHAO Guozhen,SONG Jinjing,GE Yan,et al.Advances in emotion recognition based on physiological big data[J].Journal of computer research and development,2016,53(1):80-92.
- [4]韩文静,李海峰.情感语音数据库综述[J].智能计算机与应用,2013,3(1):5-7.HAN Wenjing,LI Haifeng.A brief review on emotional speech databases[J].Intelligent computer and applications,2013,3(1):5-7.
- [5]陈浩,师雪姣,肖智议,等.高表现力情感语料库的设计[J].计算机与数字工程,2014,42(8):1383-1385.CHEN Hao,SHI Xuejiao,XIAO Zhiyi,et al.High performance emotional corpus[J].Computer&digital engineering,2014,42(8):1383-1385.
- [6]AN J,BERRY J J,JOHNSON M T.The electromagnetic articulography Mandarin accented English corpus of acoustic and 3D articulatory kinematic data[C]//Proceedings of IEEE International Conference on Acoustic,Speech and Signal Processing.Florence:IEEE,2014:7719-7723.
- [7]NARAYANAN S,NAYAK K,LEE S,et al.An approach to real-time magnetic resonance imaging for speech production[J].Journal of the Acoustical Society of America,2004,115(4):1771-1776.
- [8]MATSUO K,PALMER J B.Kinematics linkage of the tongue,jaw,and hyroid during eating and speech[J].Archives of oral biology,2010,55(4):325-331.
- [9]LEE S,YILDRIM S,KAZEMZADEH A,et al.An articulatory study of emotional speech production[C]//Proceedings of 9th European Conference on Speech Communication and Technology.Lisbon:[s.n.],2005:497-500.
- [10]NEUFELD C,VAN L P.Tongue kinematics in palate relative coordinate spaces for electro-magnetic articulography[J].Journal of the Acoustical Society of America,2014,135(1):352-361.
- [11]KIM J,TOUTIOS A,LEE S,et al.A kinematic study of critical and non-critical articulators in emotional speech production[J].Journal of the Acoustical Society of America,2015,137(3):1411-1429.
- [12]WEI J,LIU J,FANG Q,et al.A novel method for constructing 3D geometric articulatory models[J].Journal of signal processing systems,2016,82(2):295-302.
- [13]吴丹,林学訚.人脸表情视频数据库的设计与实现[J].计算机工程与应用,2004(5):177-180.WU Dan,LIN Xueyin.The design and realization of a video database for facial expression analysis[J].Computer engineering and applications,2004(5):177-180.
- [14]YUNUSOVA Y,GREEN J R,MEFFERD A.Accuracy assessment for AG500,electromagnetic articulograph[J].Journal of speech language&hearing research,2009,52(2):547-555.
- [15]王宝军,薛雨丽,于适宁,等.多模情感数据库的设计与评价[J].中国科技论文,2016,11(2):214-218.WANG Baojun,XUE Yuli,YU Shining,et al.The design and evaluation of multimode affective database[J].China sciencepaper,2016,11(2):214-218.