计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2010年
4期
113-116
,共4页
韩红旗%朱东华%刘嵩%汪雪锋
韓紅旂%硃東華%劉嵩%汪雪鋒
한홍기%주동화%류숭%왕설봉
半监督%文本分类%类关联词%期望最大化(EM)%朴素贝叶斯
半鑑督%文本分類%類關聯詞%期望最大化(EM)%樸素貝葉斯
반감독%문본분류%류관련사%기망최대화(EM)%박소패협사
semi-supervised%text classification%class associated words%Expectation-Maximization%Na(i)ve Bayes
提出了一种没有训练集情况下实现对未标注类别文本文档进行分类的问题.类关联词是与类主体相关、能反映类主体的单词或短语.利用类关联词提供的先验信息,形成文档分类的先验概率,然后组合利用朴素贝叶斯分类器和EM迭代算法,在半监督学习过程中加入分类约束条件,用类关联词来监督构造一个分类器,实现了对完全未标注类别文档的分类.实验结果证明,此方法能够以较高的准确率实现没有训练集情况下的文本分类问题,在类关联词约束下的分类准确率要高于没有约束情况下的分类准确率.
提齣瞭一種沒有訓練集情況下實現對未標註類彆文本文檔進行分類的問題.類關聯詞是與類主體相關、能反映類主體的單詞或短語.利用類關聯詞提供的先驗信息,形成文檔分類的先驗概率,然後組閤利用樸素貝葉斯分類器和EM迭代算法,在半鑑督學習過程中加入分類約束條件,用類關聯詞來鑑督構造一箇分類器,實現瞭對完全未標註類彆文檔的分類.實驗結果證明,此方法能夠以較高的準確率實現沒有訓練集情況下的文本分類問題,在類關聯詞約束下的分類準確率要高于沒有約束情況下的分類準確率.
제출료일충몰유훈련집정황하실현대미표주유별문본문당진행분류적문제.류관련사시여류주체상관、능반영류주체적단사혹단어.이용류관련사제공적선험신식,형성문당분류적선험개솔,연후조합이용박소패협사분류기화EM질대산법,재반감독학습과정중가입분류약속조건,용류관련사래감독구조일개분류기,실현료대완전미표주유별문당적분류.실험결과증명,차방법능구이교고적준학솔실현몰유훈련집정황하적문본분류문제,재류관련사약속하적분류준학솔요고우몰유약속정황하적분류준학솔.
A problem is presented to classify unlabeled text documents without training set.Class associated Words are the Words which represent the subject of classes and provide prior knowledge for training a classifier.A learning algorithm,based on the combination of Expectation-Maximization(EM)and a Na(i)ve Bayes classifier,is introduced to classify documents from fully unlabeled documents using class associated Words.In the algorithm,class associated words are used to set classification constraints during learning process to restrict to classify documents into corresponding class labels and improve the classification accuracy.Experiment results show that the technique can solve the problem with much high accuracy,and the classification accuracy with constraints is higher than that without constraints.