数据采集与处理
數據採集與處理
수거채집여처리
JOURNAL OF DATA ACQUISITION & PROCESSING
2015年
2期
390-398
,共9页
加权有限状态转换器%语音查询项检索%混淆网络%因子转换器
加權有限狀態轉換器%語音查詢項檢索%混淆網絡%因子轉換器
가권유한상태전환기%어음사순항검색%혼효망락%인자전환기
weighted finite-state transducer%spoken term detection%confusion network%factor transducer
为了提高语音查询项检索效率,提出了一种在加权有限状态转换器(Weighted finite-state transducer,WFST)框架下以混淆网络代替词格建立索引的技术。在索引建立阶段,首先将词格转化为混淆网络并用自动机形式表示,然后利用自动机构建基于时间的因子转换器,最后将所有因子转换器进行联合及优化得到索引。在查询阶段,将查询项转化为自动机形式后与索引进行合成运算得到表示查询结果的自动机。实验结果表明,在保证系统检测正确率的前提下,与直接以词格建立的 WFST 索引相比,以混淆网络建立的 WFST 索引尺寸更小,检索速度更快,因而系统性能更好。
為瞭提高語音查詢項檢索效率,提齣瞭一種在加權有限狀態轉換器(Weighted finite-state transducer,WFST)框架下以混淆網絡代替詞格建立索引的技術。在索引建立階段,首先將詞格轉化為混淆網絡併用自動機形式錶示,然後利用自動機構建基于時間的因子轉換器,最後將所有因子轉換器進行聯閤及優化得到索引。在查詢階段,將查詢項轉化為自動機形式後與索引進行閤成運算得到錶示查詢結果的自動機。實驗結果錶明,在保證繫統檢測正確率的前提下,與直接以詞格建立的 WFST 索引相比,以混淆網絡建立的 WFST 索引呎吋更小,檢索速度更快,因而繫統性能更好。
위료제고어음사순항검색효솔,제출료일충재가권유한상태전환기(Weighted finite-state transducer,WFST)광가하이혼효망락대체사격건립색인적기술。재색인건립계단,수선장사격전화위혼효망락병용자동궤형식표시,연후이용자동궤구건기우시간적인자전환기,최후장소유인자전환기진행연합급우화득도색인。재사순계단,장사순항전화위자동궤형식후여색인진행합성운산득도표시사순결과적자동궤。실험결과표명,재보증계통검측정학솔적전제하,여직접이사격건립적 WFST 색인상비,이혼효망락건립적 WFST 색인척촌경소,검색속도경쾌,인이계통성능경호。
An indexing method based on confusion network instead of Lattice is proposed in the weighted finite-state transducer framework (WFST)to improve the efficiency of the spoken term detection sys-tem.In the indexing stage,firstly confusion networks are extracted from Lattices and transformed to au-tomatons;Then,timed factor transducers are constructed with these automatons;Finally,the index is achieved by taking the union of the factor transducers and optimizing the union.In the searching stage, the queries are transformed to automatons and then composed with the index.After optimization,the au-tomaton representing the searching results is obtained.Experimental results show that compared with the WFST index based on Lattice,the confusion network-based index has smaller index size,faster searching speed and better performance when ensuring the retrieval accuracy.