自动化与仪器仪表
自動化與儀器儀錶
자동화여의기의표
AUTOMATION & INSTRUMENTATION
2013年
6期
61-62
,共2页
藏语%蒙古语%维吾尔语%GMM%语种识别
藏語%矇古語%維吾爾語%GMM%語種識彆
장어%몽고어%유오이어%GMM%어충식별
Tibetan%Mongolian%Uighur%GMM%Language identification
语种识别就是计算机能够识别出语音段所属的语言的过程。借鉴其它语种识别库的建立,设计了三种少数民族语的文本语料,并构建了语种识别库。在此语种识别库的基础上,提取了加权MFCC语音特征参数,采用高斯混合模型来训练语种识别模型,并得出识别率。通过实验得出,在现有的样本下16阶的GMM得出的识别率最高,达到了75%以上。
語種識彆就是計算機能夠識彆齣語音段所屬的語言的過程。藉鑒其它語種識彆庫的建立,設計瞭三種少數民族語的文本語料,併構建瞭語種識彆庫。在此語種識彆庫的基礎上,提取瞭加權MFCC語音特徵參數,採用高斯混閤模型來訓練語種識彆模型,併得齣識彆率。通過實驗得齣,在現有的樣本下16階的GMM得齣的識彆率最高,達到瞭75%以上。
어충식별취시계산궤능구식별출어음단소속적어언적과정。차감기타어충식별고적건립,설계료삼충소수민족어적문본어료,병구건료어충식별고。재차어충식별고적기출상,제취료가권MFCC어음특정삼수,채용고사혼합모형래훈련어충식별모형,병득출식별솔。통과실험득출,재현유적양본하16계적GMM득출적식별솔최고,체도료75%이상。
Language identification is a process that the computer can identify the language of its phonetic segment. Learning from the establishment of other language identification library, this paper designs the three minority-language text corpus and builds the language identification library. On the basis of this language identification library, it extracts the weighted MFCC speech charac-teristic parameters, trains the language identification model using Gaussian mixture model, and obtains the recognition rate. Through the experiment, the recognition rate of the 16-step GMM is highest in the existing sample, reaching more than 75%.