计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2009年
32期
111-113
,共3页
潜在语义分析%N元文法%k均值聚类%连续语音识别
潛在語義分析%N元文法%k均值聚類%連續語音識彆
잠재어의분석%N원문법%k균치취류%련속어음식별
latent semantic analysis%N-gram%k-means clustering%continuous speech recognition
研究了潜在语义分析(LSA)理论及其在连续语音识别中应用的相关技术.在此基础上利用WSJO文本语料库上构建LSA模型,并将其与3-gram模型进行插值组合,构建了包含语义信息的统计语言模型;同时为了进一步优化混合模型的性能,提出了基于密度函数初始化质心的k-means聚类算法对LSA模型的向量空间进行聚类.WSJO语料库上的连续语音识别实验结果表明:LSA+3-gram混合模型能够使识别的词错误率相比较于标准的3-gram下降13.3%.
研究瞭潛在語義分析(LSA)理論及其在連續語音識彆中應用的相關技術.在此基礎上利用WSJO文本語料庫上構建LSA模型,併將其與3-gram模型進行插值組閤,構建瞭包含語義信息的統計語言模型;同時為瞭進一步優化混閤模型的性能,提齣瞭基于密度函數初始化質心的k-means聚類算法對LSA模型的嚮量空間進行聚類.WSJO語料庫上的連續語音識彆實驗結果錶明:LSA+3-gram混閤模型能夠使識彆的詞錯誤率相比較于標準的3-gram下降13.3%.
연구료잠재어의분석(LSA)이론급기재련속어음식별중응용적상관기술.재차기출상이용WSJO문본어료고상구건LSA모형,병장기여3-gram모형진행삽치조합,구건료포함어의신식적통계어언모형;동시위료진일보우화혼합모형적성능,제출료기우밀도함수초시화질심적k-means취류산법대LSA모형적향량공간진행취류.WSJO어료고상적련속어음식별실험결과표명:LSA+3-gram혼합모형능구사식별적사착오솔상비교우표준적3-gram하강13.3%.
The theory of Latent Semantic Analysis(LSA) for speech recognition is described,and the related techniques for implementing LSA-based language modeling in speech recognition systems are presented.An LSA-based semantic model is constructed on the WSJO text corpus.This paper uses the interpolation method to combine this semantic model with conventional 3-gram to form a hybrid language model(i.e. ,LSA+3-gram ).To optimize the performance of the hybrid model,it applies k-means algorithmto perform vector clustering in the LSA vector space while the density function is used to initialize the centroid.The constructed hybrid language model outperforms the corresponding 3-gram baseline:Continuous speech recognition experiments conducted on the WSJO test corpus show a relative reduction in word error rate of about 13.3%.