宜宾学院学报
宜賓學院學報
의빈학원학보
JOURNAL OF YIBIN UNIVERSITY
2011年
6期
71-74
,共4页
Rocchio%文本分类%增量分类%半监督%层次聚类
Rocchio%文本分類%增量分類%半鑑督%層次聚類
Rocchio%문본분류%증량분류%반감독%층차취류
Rocchio%text classification%incrementally%semi-supervise%hierarchical clustering
传统的文本分类算法存在:忽视训练集的相对固定特征与新文献主题不断交化之间的矛盾,类间没有层次关系从而导致分类不太准确、效率低等问题,对此设计并实现了一种增量式的半监督文本分类算法-IC-Rocchio算法,实验结果表明,该算法能有效地改进这两方面的问题.
傳統的文本分類算法存在:忽視訓練集的相對固定特徵與新文獻主題不斷交化之間的矛盾,類間沒有層次關繫從而導緻分類不太準確、效率低等問題,對此設計併實現瞭一種增量式的半鑑督文本分類算法-IC-Rocchio算法,實驗結果錶明,該算法能有效地改進這兩方麵的問題.
전통적문본분류산법존재:홀시훈련집적상대고정특정여신문헌주제불단교화지간적모순,류간몰유층차관계종이도치분류불태준학、효솔저등문제,대차설계병실현료일충증량식적반감독문본분류산법-IC-Rocchio산법,실험결과표명,해산법능유효지개진저량방면적문제.
The traditional text classification algorithms have two disadvantages: firstly,it ignores the contradiction between the relatively fixed features of the training set and the continued changing of new document's themes;secondly,every sample of the training set belongs to one class only,and there is no hierarchical relationship in the classes.Based on the analysis of the two disadvantages,a new semi-supervised algorithm called IC-Rocchio is proposed,and it not only can generate new classes incrementally but also get multi hierarchical relationship between classes.The experiments verified the effectiveness of the algorithm to improve the two disadvantages.