数据采集与处理
數據採集與處理
수거채집여처리
JOURNAL OF DATA ACQUISITION & PROCESSING
2015年
2期
307-318
,共12页
张雄伟%吴海佳%张梁梁%邹霞
張雄偉%吳海佳%張樑樑%鄒霞
장웅위%오해가%장량량%추하
深度学习%深度自编码机%重构性%低速率语音编码%混合激励线性预测
深度學習%深度自編碼機%重構性%低速率語音編碼%混閤激勵線性預測
심도학습%심도자편마궤%중구성%저속솔어음편마%혼합격려선성예측
deep learning%deep auto-encoder%reconstructive%low bitrate speech coding%mixed excitation linear prediction
为了提高深度模型的编码重构性能,本文为传统对比散度(Contrastive divergence,CD)添加了基于交叉熵的重构误差约束。利用改进后的算法训练了重构性深度自编码机(Reconstructive deep au-to-encoder,RDAE),并用 RDAE 替换混合激励线性预测编码(Mixed excitation linear prediction, MELP)语音编码器中 LSF 参数的矢量量化方法。测试结果表明,改进后的算法在损失一定模型似然度的条件下获得了重构性能的提升,当 RDAE 隐藏层结点设为19 bit 时,本文方法所测得的加权 LSF 距离、重构语音质量、谱失真指标在训练集和测试集上均优于25 bit 矢量量化方法,即利用本文方法改进的 MELP 编码器,在不降低语音质量的条件下,可将 MELP 编码速率从2.4 kb/s 降低至2.1 kb/s,编码速率降低了12.5%。
為瞭提高深度模型的編碼重構性能,本文為傳統對比散度(Contrastive divergence,CD)添加瞭基于交扠熵的重構誤差約束。利用改進後的算法訓練瞭重構性深度自編碼機(Reconstructive deep au-to-encoder,RDAE),併用 RDAE 替換混閤激勵線性預測編碼(Mixed excitation linear prediction, MELP)語音編碼器中 LSF 參數的矢量量化方法。測試結果錶明,改進後的算法在損失一定模型似然度的條件下穫得瞭重構性能的提升,噹 RDAE 隱藏層結點設為19 bit 時,本文方法所測得的加權 LSF 距離、重構語音質量、譜失真指標在訓練集和測試集上均優于25 bit 矢量量化方法,即利用本文方法改進的 MELP 編碼器,在不降低語音質量的條件下,可將 MELP 編碼速率從2.4 kb/s 降低至2.1 kb/s,編碼速率降低瞭12.5%。
위료제고심도모형적편마중구성능,본문위전통대비산도(Contrastive divergence,CD)첨가료기우교차적적중구오차약속。이용개진후적산법훈련료중구성심도자편마궤(Reconstructive deep au-to-encoder,RDAE),병용 RDAE 체환혼합격려선성예측편마(Mixed excitation linear prediction, MELP)어음편마기중 LSF 삼수적시량양화방법。측시결과표명,개진후적산법재손실일정모형사연도적조건하획득료중구성능적제승,당 RDAE 은장층결점설위19 bit 시,본문방법소측득적가권 LSF 거리、중구어음질량、보실진지표재훈련집화측시집상균우우25 bit 시량양화방법,즉이용본문방법개진적 MELP 편마기,재불강저어음질량적조건하,가장 MELP 편마속솔종2.4 kb/s 강저지2.1 kb/s,편마속솔강저료12.5%。
In order to improve the reconstruction performance of deep models,reconstruction error con-straint based on cross entropy is added to traditional contrastive divergence (CD)algorithm.The im-proved algorithm is used to train reconstructive deep auto-encoder(RDAE),which is used to replace the vector quantization method for LSF in MELP speech coding algorithm.Experimental results show that the improved CD algorithm improves the deep model gain reconstruction performance while costing some likelihood of the model.When the node number of the hidden layer of RDAE is set to 1 9 bit,the indica-tors,which include the weighted LSF distance,the performance of reconstructed speech,and the spec-trum distortion,perform better in both training set and testing set by the proposed method than by the vector quantization method at 25 bit.That is to say,the coding bitrate of the MELP coder is reduced from 2.5 kb/s to 2.1 kb/s.The reduction rate of the coding bitrate is up to 12.5%,while the speech quality remains.