情报杂志
情報雜誌
정보잡지
Journal of Intelligence
2015年
11期
183-187
,共5页
词共现图%族群兴趣%Clauset PageRank
詞共現圖%族群興趣%Clauset PageRank
사공현도%족군흥취%Clauset PageRank
word co-occurrence diagram%group interest%Clauset PageRank
传统的话题识别方法实现对新闻媒体信息流中新话题的自动识别,主要针对长文本信息,不适用于数据稀疏的微博客。为此,本文提出一种以用户语言为基础的话题词库,构建主题词共现图进行微博客话题识别。在此基础上,分别用Clauset算法及PageRank算法进行了模块化的聚类。前者从内容视角发现了不同的兴趣簇群,其社区结构较为扁平化;后者从人的视角发现了不同的兴趣簇群,群意见领袖均为现实社会的权威人物,其社区结构呈现较明显的层级性。
傳統的話題識彆方法實現對新聞媒體信息流中新話題的自動識彆,主要針對長文本信息,不適用于數據稀疏的微博客。為此,本文提齣一種以用戶語言為基礎的話題詞庫,構建主題詞共現圖進行微博客話題識彆。在此基礎上,分彆用Clauset算法及PageRank算法進行瞭模塊化的聚類。前者從內容視角髮現瞭不同的興趣簇群,其社區結構較為扁平化;後者從人的視角髮現瞭不同的興趣簇群,群意見領袖均為現實社會的權威人物,其社區結構呈現較明顯的層級性。
전통적화제식별방법실현대신문매체신식류중신화제적자동식별,주요침대장문본신식,불괄용우수거희소적미박객。위차,본문제출일충이용호어언위기출적화제사고,구건주제사공현도진행미박객화제식별。재차기출상,분별용Clauset산법급PageRank산법진행료모괴화적취류。전자종내용시각발현료불동적흥취족군,기사구결구교위편평화;후자종인적시각발현료불동적흥취족군,군의견령수균위현실사회적권위인물,기사구결구정현교명현적층급성。
The traditional topic detection method can realize the automatic identification of the new topic in the news media information flow, which is mainly aimed at the long text information and is not suitable for data sparse microblogs. Therefore, this paper proposes a user-language-based topic thesaurus to build the keywords co-occurrence diagrams of microblog topic identification. On this basis, the Clauset algorithm and PageRank algorithm are used to carry out the modular clustering. Concerning the Clauset, different interest groups are identified from the perspective of the content, and their community structure is relatively flat;As for the PageRank, different interest clusters are found from the perspective of people, the opinion leaders of the clusters are the authority figures of social reality, and their community structure show a more significant level of resistance.