计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2013年
14期
126-129,146
,共5页
话题发现%词共现网络%遗传聚类算法%词聚类算法
話題髮現%詞共現網絡%遺傳聚類算法%詞聚類算法
화제발현%사공현망락%유전취류산법%사취류산법
topic detection%word co-occurrence network%Genetic Clustering Algorithm(GCA)%word clustering algorithm
基于词聚类的话题发现方法中,普遍存在聚类结果不稳定(聚类结果较大程度依赖于聚类对象的初始化操作)的问题,为此通过将文档集建模为词共现网络,设计词共现网络的过滤方法,然后提出基于词共现网络的遗传聚类算法,实现从网络文档中提取热点话题。与已有方法相比,该方法所发现的话题相对稳定,这在实验中亦得到了验证,因而该方法在实际应用中具有更好的现实意义。
基于詞聚類的話題髮現方法中,普遍存在聚類結果不穩定(聚類結果較大程度依賴于聚類對象的初始化操作)的問題,為此通過將文檔集建模為詞共現網絡,設計詞共現網絡的過濾方法,然後提齣基于詞共現網絡的遺傳聚類算法,實現從網絡文檔中提取熱點話題。與已有方法相比,該方法所髮現的話題相對穩定,這在實驗中亦得到瞭驗證,因而該方法在實際應用中具有更好的現實意義。
기우사취류적화제발현방법중,보편존재취류결과불은정(취류결과교대정도의뢰우취류대상적초시화조작)적문제,위차통과장문당집건모위사공현망락,설계사공현망락적과려방법,연후제출기우사공현망락적유전취류산법,실현종망락문당중제취열점화제。여이유방법상비,해방법소발현적화제상대은정,저재실험중역득도료험증,인이해방법재실제응용중구유경호적현실의의。
In the topic detection methods, there usually exists the problem of unstable clustering results. In this paper, a network document set is modeled as word co-occurrence network, and a filtering method is designed so as to simplify the network, and then a GCA(Genetic Clustering Algorithm)is proposed for clustering the simplified network, such extracting topics from a net-work document set. Compared with other existing methods, the proposed method seems more stable for the obtained clustering results, which also has been confirmed in the experiment. This means the proposed method has better practical significance in actual applications.