现代情报
現代情報
현대정보
Journal of Modern Information
2011年
8期
21~24
,共null页
文本可视化 高频词汇 k-means聚类算法 放射状树布局
文本可視化 高頻詞彙 k-means聚類算法 放射狀樹佈跼
문본가시화 고빈사회 k-means취류산법 방사상수포국
text visualization; high-frequency words; k-means algorithm; radial layout graph
为探索高频词汇间上下文关系的远近,本文研究了一种基于英文文本中高频词汇的可视化算法流程,并进行了可视化实现。我们首先用统计算法从英文文本中抽取出高频词汇及词汇间的上下文,然后定义了3种词汇间的连接方式,计算出有上下文关系的词汇间的关系度,并通过k-means算法对词汇间的关系度进行聚类,以体现出词汇间关系的远近,最后利用放射状树布局对聚类结果进行可视化。通过这种可视化形式,我们能够快速理解英文文本的内容。
為探索高頻詞彙間上下文關繫的遠近,本文研究瞭一種基于英文文本中高頻詞彙的可視化算法流程,併進行瞭可視化實現。我們首先用統計算法從英文文本中抽取齣高頻詞彙及詞彙間的上下文,然後定義瞭3種詞彙間的連接方式,計算齣有上下文關繫的詞彙間的關繫度,併通過k-means算法對詞彙間的關繫度進行聚類,以體現齣詞彙間關繫的遠近,最後利用放射狀樹佈跼對聚類結果進行可視化。通過這種可視化形式,我們能夠快速理解英文文本的內容。
위탐색고빈사회간상하문관계적원근,본문연구료일충기우영문문본중고빈사회적가시화산법류정,병진행료가시화실현。아문수선용통계산법종영문문본중추취출고빈사회급사회간적상하문,연후정의료3충사회간적련접방식,계산출유상하문관계적사회간적관계도,병통과k-means산법대사회간적관계도진행취류,이체현출사회간관계적원근,최후이용방사상수포국대취류결과진행가시화。통과저충가시화형식,아문능구쾌속리해영문문본적내용。
Targeting at exploring whether high-frequency words' context relations are close or distant,this paper studied on the algorithmic process of a kind of visual form based on high-frequency words in English texts and achieves this visual form.This paper firstly used statistic algorithm to extract high-frequency words and their context,then defined three kinds of context relations among words,compute values of relations among words that have context,cluster the values' set through k-means cluster algorithm to show whether words' context relations are close or distant.Finally,visualized these clustering results by means of radial layout graph.Through this visual form,can quickly understand the contents of the English text.