山东农业科学
山東農業科學
산동농업과학
Shandong Agricultural Sciences
2015年
10期
112-115
,共4页
叙词表%农业舆情话题%语义相似度%无向图%聚类
敘詞錶%農業輿情話題%語義相似度%無嚮圖%聚類
서사표%농업여정화제%어의상사도%무향도%취류
Thesaurus%Agriculture public opinion topic%Semantic similarity%Undirected graph%Clustering
针对如何高效地发现农业舆情话题,提出了一种基于叙词表的舆情话题发现算法。该算法首先基于《农业叙词表》和综合性词表及网络新词构建叙词词典,作为中文分词软件的词典;然后运用 TF -IDF 计算特征词的权值,选取前 P 个特征词表示文本,并基于叙词间的关系计算词语相似度;最后,以叙词为节点构建无向图,通过对无向图聚类实现网络热点话题的发现。分析结果表明,该算法的最小识别代价为0.3534,算法运行效率相比传统算法较高。
針對如何高效地髮現農業輿情話題,提齣瞭一種基于敘詞錶的輿情話題髮現算法。該算法首先基于《農業敘詞錶》和綜閤性詞錶及網絡新詞構建敘詞詞典,作為中文分詞軟件的詞典;然後運用 TF -IDF 計算特徵詞的權值,選取前 P 箇特徵詞錶示文本,併基于敘詞間的關繫計算詞語相似度;最後,以敘詞為節點構建無嚮圖,通過對無嚮圖聚類實現網絡熱點話題的髮現。分析結果錶明,該算法的最小識彆代價為0.3534,算法運行效率相比傳統算法較高。
침대여하고효지발현농업여정화제,제출료일충기우서사표적여정화제발현산법。해산법수선기우《농업서사표》화종합성사표급망락신사구건서사사전,작위중문분사연건적사전;연후운용 TF -IDF 계산특정사적권치,선취전 P 개특정사표시문본,병기우서사간적관계계산사어상사도;최후,이서사위절점구건무향도,통과대무향도취류실현망락열점화제적발현。분석결과표명,해산법적최소식별대개위0.3534,산법운행효솔상비전통산법교고。
For efficient detection of agriculture public opinion topic from massive information,a network topic detection algorithm based on thesaurus was given in this paper.Firstly,based on Agriculture Comprehen-sive Thesauruses and network new words,a thesaurus dictionary was built as the dictionary of Chinese word software;then the weight of feature words were caculated by TF -IDF,and several feature words were selected as represent text,the words similarity was computed combining with the relationship of thesaurus;finally,the thesaurus were taken as nodes to build an undirected graph,and the network hot topic detection was realized through the undirected graph clustering.The analysis results showed that the minimum algorithm cost of this method was 0.3534,its algorithm efficiency was higher than that of traditional algorithm.