情报学报
情報學報
정보학보
2009年
6期
821-826
,共6页
C-value%TF-IDF%CV-IDF%引文分析%主题识别
C-value%TF-IDF%CV-IDF%引文分析%主題識彆
C-value%TF-IDF%CV-IDF%인문분석%주제식별
C-value%TF-IDF%CV-IDF%citation analysis%topic recognization
引文分析是科技情报分析的一种重要方法和技术,特别是建立在共耦合和共被引基础上的引文聚类分析逐渐发展成为科技情报分析中最活跃的研究领域之一.引文聚类分析形成一系列由科技文献组成的文献簇,并不能直接体现出文献簇的主题,因此需要识别这些文献簇的内容特征.本文分析了引文分析中文献簇主题识别的典型方法及局限,提出了结合C-value和TF-IDF算法的文献簇主题识别方法.实验表明,该方法可以充分地利用C-value和TF-IDF算法的优点,对C-value和TF-IDF算法中不合理的地方予以了改进,从而可以更好地应用于引文分析中文献簇的主题识别.
引文分析是科技情報分析的一種重要方法和技術,特彆是建立在共耦閤和共被引基礎上的引文聚類分析逐漸髮展成為科技情報分析中最活躍的研究領域之一.引文聚類分析形成一繫列由科技文獻組成的文獻簇,併不能直接體現齣文獻簇的主題,因此需要識彆這些文獻簇的內容特徵.本文分析瞭引文分析中文獻簇主題識彆的典型方法及跼限,提齣瞭結閤C-value和TF-IDF算法的文獻簇主題識彆方法.實驗錶明,該方法可以充分地利用C-value和TF-IDF算法的優點,對C-value和TF-IDF算法中不閤理的地方予以瞭改進,從而可以更好地應用于引文分析中文獻簇的主題識彆.
인문분석시과기정보분석적일충중요방법화기술,특별시건립재공우합화공피인기출상적인문취류분석축점발전성위과기정보분석중최활약적연구영역지일.인문취류분석형성일계렬유과기문헌조성적문헌족,병불능직접체현출문헌족적주제,인차수요식별저사문헌족적내용특정.본문분석료인문분석중문헌족주제식별적전형방법급국한,제출료결합C-value화TF-IDF산법적문헌족주제식별방법.실험표명,해방법가이충분지이용C-value화TF-IDF산법적우점,대C-value화TF-IDF산법중불합리적지방여이료개진,종이가이경호지응용우인문분석중문헌족적주제식별.
Citation analysis is an important method in information analysis of science and technology and especially citation cluster analysis based on bibliographic coupling or co-citation has become one of the most active research areas. Citation cluster analysis forms a series of paper clusters which consists of science and technology documents. It is necessary to recoginize the topic of these clusters. This paper analyzes some typical approaches of topic recognization in citation analysis and their drawbacks . Then a new method that combines C-value algorithm with TF-IDF algorithm for topic recoginization is proposed. Our experimental results prove that the proposed approach can utilize the merits of C-value algorithm and TF-IDF algorithm,and thus can be better used in topic recognization of paper clusters.