模式识别与人工智能
模式識彆與人工智能
모식식별여인공지능
Moshi Shibie yu Rengong Zhineng
2015年
1期
1-10
,共10页
微博%热点话题发现%老化理论%热词抽取%多标签传播
微博%熱點話題髮現%老化理論%熱詞抽取%多標籤傳播
미박%열점화제발현%노화이론%열사추취%다표첨전파
Microblog%Hot Topic Detection%Aging Theory%Hot Term Extraction%Multi-label Propagation
微博热点话题发现是目前的研究热点。针对传统热词抽取方法难以适用于微博数据的问题,提出一种基于老化理论的词生命值计算模型用于热词抽取,并基于热词间的相关性构建词共现网络;针对传统的词聚类算法不能较好地解决话题间存在重叠热词以及时间效率不佳的问题,引入多标签传播思想,设计一种接近线性时间复杂度的多标签传播聚类算法( TCMLPA)用于词共现网络的热词聚类,获得热点话题集。实验结果表明,词生命值计算模型能够有效过滤噪声并提取热词,TCMLPA算法则能够在保证聚类结果稳定性的情况下,有效提高热点话题发现的精度和效率。
微博熱點話題髮現是目前的研究熱點。針對傳統熱詞抽取方法難以適用于微博數據的問題,提齣一種基于老化理論的詞生命值計算模型用于熱詞抽取,併基于熱詞間的相關性構建詞共現網絡;針對傳統的詞聚類算法不能較好地解決話題間存在重疊熱詞以及時間效率不佳的問題,引入多標籤傳播思想,設計一種接近線性時間複雜度的多標籤傳播聚類算法( TCMLPA)用于詞共現網絡的熱詞聚類,穫得熱點話題集。實驗結果錶明,詞生命值計算模型能夠有效過濾譟聲併提取熱詞,TCMLPA算法則能夠在保證聚類結果穩定性的情況下,有效提高熱點話題髮現的精度和效率。
미박열점화제발현시목전적연구열점。침대전통열사추취방법난이괄용우미박수거적문제,제출일충기우노화이론적사생명치계산모형용우열사추취,병기우열사간적상관성구건사공현망락;침대전통적사취류산법불능교호지해결화제간존재중첩열사이급시간효솔불가적문제,인입다표첨전파사상,설계일충접근선성시간복잡도적다표첨전파취류산법( TCMLPA)용우사공현망락적열사취류,획득열점화제집。실험결과표명,사생명치계산모형능구유효과려조성병제취열사,TCMLPA산법칙능구재보증취류결과은정성적정황하,유효제고열점화제발현적정도화효솔。
With the rapid growth of microblog data, extracting hot topics from vast amounts of microblog posts has become a research hotspot. The traditional methods for hot term extraction can hardly apply to microblog data, thus a life value calculation model based on aging theory is established to extract hot terms. Then, a hot term co-occurrence network is built based on the correlations between hot terms. Aiming at the problem that traditional clustering methods can hardly handle the hot term overlap between different topics and can not deal with vast amounts of data efficiently, a term clustering method based on multi-label propagation algorithm ( TCMLPA) , which has a nearly linear time complexity, is proposed to detect hot topics in hot term co-occurrence network. The experimental results show that life value calculation model can filter noise and extract hot terms effectively. Meanwhile, TCMLPA ensures the stability of clustering result and improves the accuracy and efficiency of hot topic detection.