计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2014年
19期
222-226
,共5页
提取%组合特征%组合词%有向图%新闻网页
提取%組閤特徵%組閤詞%有嚮圖%新聞網頁
제취%조합특정%조합사%유향도%신문망혈
extraction%multi-features%compound words%directed graph%news Web page
针对中文新闻网页的特点,使用了包括统计特征、位置特征和词性特征等在内的多种特征综合评定候选关键词的权重大小。对于部分分词结果不能良好地反映主题的问题,提出了一种基于有向图的组合词生成方法,旨在找出高频次的相邻词作为组合词。实验结果表明,该方法较传统的TF-IDF方法效率有较大提升,能够有效提取出新闻网页关键词。
針對中文新聞網頁的特點,使用瞭包括統計特徵、位置特徵和詞性特徵等在內的多種特徵綜閤評定候選關鍵詞的權重大小。對于部分分詞結果不能良好地反映主題的問題,提齣瞭一種基于有嚮圖的組閤詞生成方法,旨在找齣高頻次的相鄰詞作為組閤詞。實驗結果錶明,該方法較傳統的TF-IDF方法效率有較大提升,能夠有效提取齣新聞網頁關鍵詞。
침대중문신문망혈적특점,사용료포괄통계특정、위치특정화사성특정등재내적다충특정종합평정후선관건사적권중대소。대우부분분사결과불능량호지반영주제적문제,제출료일충기우유향도적조합사생성방법,지재조출고빈차적상린사작위조합사。실험결과표명,해방법교전통적TF-IDF방법효솔유교대제승,능구유효제취출신문망혈관건사。
Considering the characteristics of Chinese news Web pages, this paper uses many features including statistical feature, position feature and POS(Part of Speech)feature to evaluate the weight of candidate keywords. In order to solve the problem of that some segmentation cannot reflect the theme, this paper proposes a compound words generation method based on directed graph, which aims to find adjacency words for compound words. The experimental results show that this method is vastly superior to the conventional TF-IDF method in efficiency and can extract keyword from news Web page efficiently.