计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2014年
21期
143-146
,共4页
检索结果聚类%后缀树%聚类标签%中文检索%聚类
檢索結果聚類%後綴樹%聚類標籤%中文檢索%聚類
검색결과취류%후철수%취류표첨%중문검색%취류
search results clustering%suffix tree%cluster label%Chinese search%clustering
检索结果聚类能够帮助用户快速定位需要查找的信息。注重进行中文文本聚类的同时生成高质量的标签,获取搜索引擎返回的网页标题和,利用分词工具对文本分词,去除停用词;统一构建一棵后缀树,以词语为单位插入后缀树各节点,通过词频、词长、词性和位置几项约束条件计算各节点词语得分;合并基类取得分高的节点词作标签。实验结果显示该方法的聚类簇纯度较高,提取的标签准确且区分性较强,方便用户使用。
檢索結果聚類能夠幫助用戶快速定位需要查找的信息。註重進行中文文本聚類的同時生成高質量的標籤,穫取搜索引擎返迴的網頁標題和,利用分詞工具對文本分詞,去除停用詞;統一構建一棵後綴樹,以詞語為單位插入後綴樹各節點,通過詞頻、詞長、詞性和位置幾項約束條件計算各節點詞語得分;閤併基類取得分高的節點詞作標籤。實驗結果顯示該方法的聚類簇純度較高,提取的標籤準確且區分性較彊,方便用戶使用。
검색결과취류능구방조용호쾌속정위수요사조적신식。주중진행중문문본취류적동시생성고질량적표첨,획취수색인경반회적망혈표제화,이용분사공구대문본분사,거제정용사;통일구건일과후철수,이사어위단위삽입후철수각절점,통과사빈、사장、사성화위치궤항약속조건계산각절점사어득분;합병기류취득분고적절점사작표첨。실험결과현시해방법적취류족순도교고,제취적표첨준학차구분성교강,방편용호사용。
The search result clustering can help users quickly find the information needed. This paper focuses on Chinese text clustering and how to generate high quality tags. The search engine returns the webpage title and abstract. It uses text segmentation tool to segment text, and removes stop words;it constructs a suffix tree, with words put into the suffix tree nodes. By several constraint conditions such as word frequency, word length, word and location, it calculates each node score; it combines base clusters and makes node word with high score as the label. The experimental results show this method’s clusters have high purity. The extracted labels are accurate and distinguish strongly. It’s user-friendly.