模式识别与人工智能
模式識彆與人工智能
모식식별여인공지능
Moshi Shibie yu Rengong Zhineng
2014年
7期
646-654
,共9页
半监督聚类%近邻消息传播%强类别特征%类相似性
半鑑督聚類%近鄰消息傳播%彊類彆特徵%類相似性
반감독취류%근린소식전파%강유별특정%류상사성
Semi-Supervised Clustering%Affinity Message Propagation%Strong Classification Features%Class Similarity
为处理高维稀疏的大规模文档数据,提出一种基于强类别特征近邻传播( SCFAP)的半监督文本聚类算法。聚类过程中,利用少量带类别标签的监督数据,提取具有强类别区分能力的特征项以构建更有效的样本间相似性测度。并在每轮迭代完成后将类别确定性程度最高的未标记样本转移到已标注集,使算法执行效率提高。实验结果表明,这种改进对于近邻传播算法的性能和准确度的提升有较大帮助,在Reuter-21578和20Newsgroups两个相异数据集上,SCFAP算法表现较好的适用性。综合考察聚类微平均Fμ指标和类簇纯度Pt指标,该算法在少量监督信息辅助下能快速获得较好的聚类结果。
為處理高維稀疏的大規模文檔數據,提齣一種基于彊類彆特徵近鄰傳播( SCFAP)的半鑑督文本聚類算法。聚類過程中,利用少量帶類彆標籤的鑑督數據,提取具有彊類彆區分能力的特徵項以構建更有效的樣本間相似性測度。併在每輪迭代完成後將類彆確定性程度最高的未標記樣本轉移到已標註集,使算法執行效率提高。實驗結果錶明,這種改進對于近鄰傳播算法的性能和準確度的提升有較大幫助,在Reuter-21578和20Newsgroups兩箇相異數據集上,SCFAP算法錶現較好的適用性。綜閤攷察聚類微平均Fμ指標和類簇純度Pt指標,該算法在少量鑑督信息輔助下能快速穫得較好的聚類結果。
위처리고유희소적대규모문당수거,제출일충기우강유별특정근린전파( SCFAP)적반감독문본취류산법。취류과정중,이용소량대유별표첨적감독수거,제취구유강유별구분능력적특정항이구건경유효적양본간상사성측도。병재매륜질대완성후장유별학정성정도최고적미표기양본전이도이표주집,사산법집행효솔제고。실험결과표명,저충개진대우근린전파산법적성능화준학도적제승유교대방조,재Reuter-21578화20Newsgroups량개상이수거집상,SCFAP산법표현교호적괄용성。종합고찰취류미평균Fμ지표화류족순도Pt지표,해산법재소량감독신식보조하능쾌속획득교호적취류결과。
A semi-supervised text clustering based on strong classification features affinity propagation ( SCFAP) is proposed to handle spare document data with large scale and high dimensions. In the clustering process, strong classification features are extracted to construct a reasonable similarity measure by using a small amount of labeled samples. Moreover, in order to improve the execution efficiency of the algorithm, the unlabeled documents with maximum category certainty are transferred from unlabeled collection to labeled collection in each round of iteration. The experimental results show that the improvement is greatly helpful to upgrade the performance and accuracy of the classical affinity propagation algorithm. The SCFAP algorithm shows better applicability on Reuter-21578 and 20 Newsgroups. The micro average Fμindex and the clustering purity index are synthetically observed, the semi-supervised text clustering algorithm based on SCFAP can get better clustering results rapidly.