天津大学学报
天津大學學報
천진대학학보
JOURNAL OF TIANJIN UNIVERSITY SCIENCE AND TECHNOLOGY
2015年
8期
670-674
,共5页
情感语音合成%定量目标逼近%高斯双向联想贮存器%基频转换
情感語音閤成%定量目標逼近%高斯雙嚮聯想貯存器%基頻轉換
정감어음합성%정량목표핍근%고사쌍향련상저존기%기빈전환
emotional speech synthesis%qTA%GBAM%F0 transformation
提出了一种用于情感语音合成的基频转换方法.该方法使用定量目标逼近(qTA)特征作为语音音节层的基频描述,并用高斯双向联想贮存器(GBAM)实现中性合成语音音节层 qTA 参数向目标情感语音音节层 qTA 参数的转换.在模型训练阶段,首先基于中性语料库和统计参数语音合成方法构建中性语音合成系统;然后利用少量情感录音数据,将从情感语音文本对应的中性合成语音中提取的 qTA 参数作为源数据,将情感录音中提取的 qTA 参数作为目标数据,进行GBAM转换模型的训练.在情感语音合成阶段,利用训练得到的GABM模型,实现中性合成语音基频特征向目标情感的转换.实验结果表明,该方法在目标情感数据较少的情况下可以取得比最大似然线性回归(MLLR)模型自适应方法更好的情感表现力.
提齣瞭一種用于情感語音閤成的基頻轉換方法.該方法使用定量目標逼近(qTA)特徵作為語音音節層的基頻描述,併用高斯雙嚮聯想貯存器(GBAM)實現中性閤成語音音節層 qTA 參數嚮目標情感語音音節層 qTA 參數的轉換.在模型訓練階段,首先基于中性語料庫和統計參數語音閤成方法構建中性語音閤成繫統;然後利用少量情感錄音數據,將從情感語音文本對應的中性閤成語音中提取的 qTA 參數作為源數據,將情感錄音中提取的 qTA 參數作為目標數據,進行GBAM轉換模型的訓練.在情感語音閤成階段,利用訓練得到的GABM模型,實現中性閤成語音基頻特徵嚮目標情感的轉換.實驗結果錶明,該方法在目標情感數據較少的情況下可以取得比最大似然線性迴歸(MLLR)模型自適應方法更好的情感錶現力.
제출료일충용우정감어음합성적기빈전환방법.해방법사용정량목표핍근(qTA)특정작위어음음절층적기빈묘술,병용고사쌍향련상저존기(GBAM)실현중성합성어음음절층 qTA 삼수향목표정감어음음절층 qTA 삼수적전환.재모형훈련계단,수선기우중성어료고화통계삼수어음합성방법구건중성어음합성계통;연후이용소량정감록음수거,장종정감어음문본대응적중성합성어음중제취적 qTA 삼수작위원수거,장정감록음중제취적 qTA 삼수작위목표수거,진행GBAM전환모형적훈련.재정감어음합성계단,이용훈련득도적GABM모형,실현중성합성어음기빈특정향목표정감적전환.실험결과표명,해방법재목표정감수거교소적정황하가이취득비최대사연선성회귀(MLLR)모형자괄응방법경호적정감표현력.
In this paper,an F0 transformation method for emotional speech synthesis was proposed.Quantitative target approximation(qTA)features were used to represent F0 contour in syllable level.And Gaussian directional as-sociative memories(GBAM)was used to complete the transformation of syllable-level qTA parameters from synthe-sized neutral speech to target emotional recordings.In the training stage,firstly HMM-based statistical parametric speech synthesis was used to construct a neutral speech synthesis system with neutral corpus as training set.And then,with a small amount of emotional recording data,GBAM-based transformation model was trained by using the qTA parameters extracted from synthesized neutral speech corresponding to the emotional text as the source feature and the qTA parameters extracted from target emotional recordings as the target patterns of GBAM,respectively.In the generation of emotional speech,the trained GBAM model was utilized to complete the transformation of syllable-level F0 features from synthesized neutral speech to target emotional recordings.The experiment results indicate that,in the case of little emotional recording data,the proposed method performs better in emotional expressivity than the adaptation method using maximum likelihood linear regression(MLLR).