重庆邮电大学学报(自然科学版)
重慶郵電大學學報(自然科學版)
중경유전대학학보(자연과학판)
JOURNAL OF CHONGQING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS(NATURAL SCIENCE EDITION)
2012年
2期
127-132
,共6页
鲁棒性%特征提取%均值减%均值方差归一(MVN)%梅尔频率倒谱系数(MFCC)%统计阈值%语音识别
魯棒性%特徵提取%均值減%均值方差歸一(MVN)%梅爾頻率倒譜繫數(MFCC)%統計閾值%語音識彆
로봉성%특정제취%균치감%균치방차귀일(MVN)%매이빈솔도보계수(MFCC)%통계역치%어음식별
robust%feature extraction%mean subtraction%mean and variant normalization%Mel-frequency cepstrum coefficient (MFCC)%statistical thresholding%speech recognition
近几十年来,语音识别系统已由实验室环境走向真实的世界中.在不同的环境噪声下,识别性能却仍不尽人意,尤其是在低信噪比的环境中.为解决在低信噪比情况下的低识别率的问题,以声学参数MFCC( Mel-frequen-cy cepstrum coefficient)为基础,提出了一种基于统计阈值的倒谱均值方差归一化算法,该算法能进一步减小训练环境和测试环境的不匹配程度,从而提升了语音识别系统对环境噪声的鲁棒性.首先,对输入的语音提取MFCC声学参数,然后对提取的声学参数作均值方差归一化处理,最后采用统计阈值的方法抑制归一化后存在变异的特征.该算法能增加带噪语音特征和纯净语音特征的相似性;与MFCC为基线的系统相比,在低信噪比情况下,该算法的错误率最高下降约40%,同时该方法也优于其他的鲁棒性特征倒谱均值减和倒谱均值归一.
近幾十年來,語音識彆繫統已由實驗室環境走嚮真實的世界中.在不同的環境譟聲下,識彆性能卻仍不儘人意,尤其是在低信譟比的環境中.為解決在低信譟比情況下的低識彆率的問題,以聲學參數MFCC( Mel-frequen-cy cepstrum coefficient)為基礎,提齣瞭一種基于統計閾值的倒譜均值方差歸一化算法,該算法能進一步減小訓練環境和測試環境的不匹配程度,從而提升瞭語音識彆繫統對環境譟聲的魯棒性.首先,對輸入的語音提取MFCC聲學參數,然後對提取的聲學參數作均值方差歸一化處理,最後採用統計閾值的方法抑製歸一化後存在變異的特徵.該算法能增加帶譟語音特徵和純淨語音特徵的相似性;與MFCC為基線的繫統相比,在低信譟比情況下,該算法的錯誤率最高下降約40%,同時該方法也優于其他的魯棒性特徵倒譜均值減和倒譜均值歸一.
근궤십년래,어음식별계통이유실험실배경주향진실적세계중.재불동적배경조성하,식별성능각잉불진인의,우기시재저신조비적배경중.위해결재저신조비정황하적저식별솔적문제,이성학삼수MFCC( Mel-frequen-cy cepstrum coefficient)위기출,제출료일충기우통계역치적도보균치방차귀일화산법,해산법능진일보감소훈련배경화측시배경적불필배정도,종이제승료어음식별계통대배경조성적로봉성.수선,대수입적어음제취MFCC성학삼수,연후대제취적성학삼수작균치방차귀일화처리,최후채용통계역치적방법억제귀일화후존재변이적특정.해산법능증가대조어음특정화순정어음특정적상사성;여MFCC위기선적계통상비,재저신조비정황하,해산법적착오솔최고하강약40%,동시해방법야우우기타적로봉성특정도보균치감화도보균치귀일.
Speech recognition systems have been applied in real world applications for several decades, where there should be an unsatisfactory recognition performance under various noise conditions, particularly in lower signal-to-noise ratio (SNR) circumstances.In this paper, we propose a statistical thresholding method for mean and variance normalization technique, further reducing the mismatch between training and testing environments, which makes an automatic speech recognition system more robust to environmental changes.Mel-frequency cepstrum coefficient (MFCC) features are extracted as acoustic features, and they are further normalized with the mean and variance normalization method to get the cepstral mean and variance normalization (CMVN) features.The proposed statistical thresholding method is then applied.The viability of the proposed approach was verified in various experiments with different types of background noises at different SNR levels.In an isolated word recognition task, the experimental results show that the proposed approach reduced the error rate by over 40% in some cases compared with the baseline MFCC front-end, and under lower SNR conditions the proposed method also outperforms other robust features such as cepstral mean subtraction (CMS) and CMVN.