计算机应用研究
計算機應用研究
계산궤응용연구
APPLICATION RESEARCH OF COMPUTERS
2010年
2期
472-474
,共3页
文本分类%特征权重%可信计算%概率确定性密度%自然语言处理
文本分類%特徵權重%可信計算%概率確定性密度%自然語言處理
문본분류%특정권중%가신계산%개솔학정성밀도%자연어언처리
text categorization(TC)%feature weighting%trust computing%probability certainty density%natural language processing
从可信计算角度,提出一种可靠信任推荐文本分类特征权重算法,分析了特征在文档中的特性,基于Beta分布函数研究了特征与文档类之间的信任关系,建立特征权重计算模型,并实现简单高效的线性文本分类器.在比较实验中采用20newsgroup和复旦中文语料集.与TFIDF算法进行性能比较,实验结果显示该算法性能较TFIDF显著提高,并对非平衡语料具有良好的适应性.
從可信計算角度,提齣一種可靠信任推薦文本分類特徵權重算法,分析瞭特徵在文檔中的特性,基于Beta分佈函數研究瞭特徵與文檔類之間的信任關繫,建立特徵權重計算模型,併實現簡單高效的線性文本分類器.在比較實驗中採用20newsgroup和複旦中文語料集.與TFIDF算法進行性能比較,實驗結果顯示該算法性能較TFIDF顯著提高,併對非平衡語料具有良好的適應性.
종가신계산각도,제출일충가고신임추천문본분류특정권중산법,분석료특정재문당중적특성,기우Beta분포함수연구료특정여문당류지간적신임관계,건립특정권중계산모형,병실현간단고효적선성문본분류기.재비교실험중채용20newsgroup화복단중문어료집.여TFIDF산법진행성능비교,실험결과현시해산법성능교TFIDF현저제고,병대비평형어료구유량호적괄응성.
By reliable trust recommendation,used a feature weighting approach to construct the simplest linear weighting classifier in the procedure of which characteristics of feature were explored,while the trust relationship between features and categories was developed based on Beta distribution function.Experiments with 20newsgroup and Fudan Chinese evaluation data collection reported shows that this new algorithm generally outperformed TFIDF,and has good adaptability to non-equilibrium corpus.