计算机系统应用
計算機繫統應用
계산궤계통응용
APPLICATIONS OF THE COMPUTER SYSTEMS
2015年
6期
183-187
,共5页
SVM%组合核函数%不良文本%信息识别%召回率
SVM%組閤覈函數%不良文本%信息識彆%召迴率
SVM%조합핵함수%불량문본%신식식별%소회솔
SVM%combination kernel function%undesirable text%information identification%recall
不良文本识别的实际应用中,大多数文本之间总有交界甚至彼此掺杂,这种非线性不可分问题给不良文本识别带来了难度。应用 SVM 通过非线性变换可以使原空间转化为某个高维空间中的线性问题,而选择合适的核函数是 SVM 的关键。由于单核无法兼顾对独立的不良词汇和词汇组合的识别,使识别准确率不高,而且也无法兼顾召回率。针对不良文本识别的特定应用,依据 Mercer 定理结合线性核与多项式核提出了一种新的组合核函数,这种组合核函数能兼顾线性核与多项式核的优势,能够实现对独立的不良词汇以及词汇组合进行识别。在仿真实验中评估了线性核、齐次多项式核以及组合核函数,实验结果表明组合核函数的识别准确率与召回率都比较理想。
不良文本識彆的實際應用中,大多數文本之間總有交界甚至彼此摻雜,這種非線性不可分問題給不良文本識彆帶來瞭難度。應用 SVM 通過非線性變換可以使原空間轉化為某箇高維空間中的線性問題,而選擇閤適的覈函數是 SVM 的關鍵。由于單覈無法兼顧對獨立的不良詞彙和詞彙組閤的識彆,使識彆準確率不高,而且也無法兼顧召迴率。針對不良文本識彆的特定應用,依據 Mercer 定理結閤線性覈與多項式覈提齣瞭一種新的組閤覈函數,這種組閤覈函數能兼顧線性覈與多項式覈的優勢,能夠實現對獨立的不良詞彙以及詞彙組閤進行識彆。在倣真實驗中評估瞭線性覈、齊次多項式覈以及組閤覈函數,實驗結果錶明組閤覈函數的識彆準確率與召迴率都比較理想。
불량문본식별적실제응용중,대다수문본지간총유교계심지피차참잡,저충비선성불가분문제급불량문본식별대래료난도。응용 SVM 통과비선성변환가이사원공간전화위모개고유공간중적선성문제,이선택합괄적핵함수시 SVM 적관건。유우단핵무법겸고대독립적불량사회화사회조합적식별,사식별준학솔불고,이차야무법겸고소회솔。침대불량문본식별적특정응용,의거 Mercer 정리결합선성핵여다항식핵제출료일충신적조합핵함수,저충조합핵함수능겸고선성핵여다항식핵적우세,능구실현대독립적불량사회이급사회조합진행식별。재방진실험중평고료선성핵、제차다항식핵이급조합핵함수,실험결과표명조합핵함수적식별준학솔여소회솔도비교이상。
In practical application of undesirable text information identification, most of the text always have intersection even doped with each other. The nonlinear non-separable problem has brought difficulty to undesirable text information identification. SVM can make a nonlinear problem in the original space into a linear problem in high dimension space by nonlinear transformation. And the key of the SVM is to choose the appropriate kernel function. A single kernel function can not recognize the independent undesirable vocabulary and vocabulary combination at the same time, so the recognition accuracy rate is not high and the Rcall value is not ideal. For the specific application of undesirable text information identification, combining with linear kernel and homogeneous polynomial kernel it structured a new combination kernel function according to the Mercer theorem. This combination kernel function has the advantage of both linear kernel and polynomial kernel, and could identify the independent undesirable vocabulary and vocabulary combination. Then it evaluated the linear kernel, homogeneous polynomial kernel and combination kernel function in the sample experiment. The experimental results showed that the recognition accuracy rate and the Rcall value of combination kernel function was more ideal than other kernel functions.