东南大学学报(自然科学版)
東南大學學報(自然科學版)
동남대학학보(자연과학판)
Journal of Southeast University (Natural Science Edition)
2015年
5期
817-821
,共5页
陶华伟%查诚%梁瑞宇%张昕然%赵力%王青云
陶華偉%查誠%樑瑞宇%張昕然%趙力%王青雲
도화위%사성%량서우%장흔연%조력%왕청운
情感识别%语谱图%图像纹理特征%局部二值模式
情感識彆%語譜圖%圖像紋理特徵%跼部二值模式
정감식별%어보도%도상문리특정%국부이치모식
emotion recognition%spectrogram%image texture feature%local binary pattern
为研究信号相关性在语音情感识别中的作用,提出了一种面向语音情感识别的语谱图特征提取算法。首先,对语谱图进行处理,得到归一化后的语谱图灰度图像;然后,计算不同尺度、不同方向的 Gabor 图谱,并采用局部二值模式提取 Gabor 图谱的纹理特征;最后,将不同尺度、不同方向 Gabor 图谱提取到的局部二值模式特征进行级联,作为一种新的语音情感特征进行情感识别。柏林库(EMO-DB)及 FAU AiBo 库上的实验结果表明:与已有的韵律、频域、音质特征相比,所提特征的识别率提升3%以上;与声学特征融合后,所提特征的识别率较早期声学特征至少提高5%。因此,利用这种新的语音情感特征可以有效识别不同种类的情感语音。
為研究信號相關性在語音情感識彆中的作用,提齣瞭一種麵嚮語音情感識彆的語譜圖特徵提取算法。首先,對語譜圖進行處理,得到歸一化後的語譜圖灰度圖像;然後,計算不同呎度、不同方嚮的 Gabor 圖譜,併採用跼部二值模式提取 Gabor 圖譜的紋理特徵;最後,將不同呎度、不同方嚮 Gabor 圖譜提取到的跼部二值模式特徵進行級聯,作為一種新的語音情感特徵進行情感識彆。柏林庫(EMO-DB)及 FAU AiBo 庫上的實驗結果錶明:與已有的韻律、頻域、音質特徵相比,所提特徵的識彆率提升3%以上;與聲學特徵融閤後,所提特徵的識彆率較早期聲學特徵至少提高5%。因此,利用這種新的語音情感特徵可以有效識彆不同種類的情感語音。
위연구신호상관성재어음정감식별중적작용,제출료일충면향어음정감식별적어보도특정제취산법。수선,대어보도진행처리,득도귀일화후적어보도회도도상;연후,계산불동척도、불동방향적 Gabor 도보,병채용국부이치모식제취 Gabor 도보적문리특정;최후,장불동척도、불동방향 Gabor 도보제취도적국부이치모식특정진행급련,작위일충신적어음정감특정진행정감식별。백림고(EMO-DB)급 FAU AiBo 고상적실험결과표명:여이유적운률、빈역、음질특정상비,소제특정적식별솔제승3%이상;여성학특정융합후,소제특정적식별솔교조기성학특정지소제고5%。인차,이용저충신적어음정감특정가이유효식별불동충류적정감어음。
In order to study the role of signal correlation in emotional speech recognition,a spectro-gram feature extraction algorithm for speech emotion recognition is proposed.First,speech signal is quantized as speech spectrum gray image after preprocessing.Then,the Gabor spectrum images with different scales and different directions are calculated,and the texture features are extracted by local binary pattern (LBP).Finally,the LBP features of the Gabor spectrogram images with different scales and different directions are joined to form a new feature for emotion recognition.The experi-mental results of EMO-DB and FAU AiBo show that the recognition rate of the proposed features can be raised to at least 3% higher than those of the conventional rhythm and frequency domain features. After fusion with acoustic features,the recognition rate can be raised to at least 5% higher than those of the conventional acoustic features.Therefore,the proposed features can effectively identify differ-ent kinds of emotional speech.