计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2014年
1期
199-202
,共4页
客户端短雷过滤%样本特征库%嵌入式数据库%ARM-Linux平台%移植%互雷息
客戶耑短雷過濾%樣本特徵庫%嵌入式數據庫%ARM-Linux平檯%移植%互雷息
객호단단뢰과려%양본특정고%감입식수거고%ARM-Linux평태%이식%호뢰식
client SMS filtering%sample characteristics library%embedded database%ARM-Linux platform%transplanting%mutual information
针对目前中文短雷过滤研究缺乏样本库的现状,提出一种客户端样本特征库生成方法。设计客户端短雷过滤样本特征数据库,将客户端接收到的短雷进行预处理和中文分词,考虑高雷息量的低频词和带有较强类别特性的特征词,改进互雷息评价函数提取样本特征,需成特征数据。采用Naive Bayes算法测试特征数目对过滤器性能的影陞,实验结果表明,当特征数目为10时,测试准确率达到最大值,当样本特征库中短雷数目达到2000条时,数据库文件的大雓约为714.28 KB,可在普通手机平台上运行,验证了特征库生成方法的可行性。
針對目前中文短雷過濾研究缺乏樣本庫的現狀,提齣一種客戶耑樣本特徵庫生成方法。設計客戶耑短雷過濾樣本特徵數據庫,將客戶耑接收到的短雷進行預處理和中文分詞,攷慮高雷息量的低頻詞和帶有較彊類彆特性的特徵詞,改進互雷息評價函數提取樣本特徵,需成特徵數據。採用Naive Bayes算法測試特徵數目對過濾器性能的影陞,實驗結果錶明,噹特徵數目為10時,測試準確率達到最大值,噹樣本特徵庫中短雷數目達到2000條時,數據庫文件的大雓約為714.28 KB,可在普通手機平檯上運行,驗證瞭特徵庫生成方法的可行性。
침대목전중문단뢰과려연구결핍양본고적현상,제출일충객호단양본특정고생성방법。설계객호단단뢰과려양본특정수거고,장객호단접수도적단뢰진행예처리화중문분사,고필고뢰식량적저빈사화대유교강유별특성적특정사,개진호뢰식평개함수제취양본특정,수성특정수거。채용Naive Bayes산법측시특정수목대과려기성능적영승,실험결과표명,당특정수목위10시,측시준학솔체도최대치,당양본특정고중단뢰수목체도2000조시,수거고문건적대여약위714.28 KB,가재보통수궤평태상운행,험증료특정고생성방법적가행성。
In view of the lack of Chinese SMS sample libraries, this paper proposes a client sample characteristics library generation method. It gives the design of sample characteristics database for client SMS spam filtering, and completes text preprocessing and Chinese word segmentation for messages received from the client, considering the low frequency words having a high amount of information and terms with strong category characteristic, it improves mutual information extraction evaluation function, and extracts the sample characteristic and forms the characteristic data. Experiment tests the impact of the number of features on filter performance using the Bayesian algorithm, and results show that the accuracy rate reaches a maximum when the number of features is 10. Experiment also tests the database file size, and when the number of key words reach 2 000, the size of database file is about 714.28 KB. It can run on the ordinary mobile phone platform, and tests show the feasibility of the method.