淮阴工学院学报
淮陰工學院學報
회음공학원학보
Journal of Huaiyin Institute of Technology
2015年
5期
18-24
,共7页
朱全银%潘禄%刘文儒%李翔%张永军%刘金岭
硃全銀%潘祿%劉文儒%李翔%張永軍%劉金嶺
주전은%반록%류문유%리상%장영군%류금령
科技新闻%文本分类%TF-IDF%抽取算法
科技新聞%文本分類%TF-IDF%抽取算法
과기신문%문본분류%TF-IDF%추취산법
scientific-related news%text categorization%TF-IDF%extraction algorithm
为了改善从Web上获取的新闻信息的使用价值,针对Web网站存在大量非科技相关新闻的现状,以互联网上政府新闻网站、凤凰网等新闻为研究背景,选取TF-IDF文本加权方法,设计了科技新闻多层次二分类模型,实现了基于TF-IDF的科技新闻文本分类抽取系统,在20万新闻文档和4000多种分类上,实验取得了科技新闻85.3%的识别准确率和非科技新闻82.9%的识别率,为Web科技新闻分类抽取提供有实用价值的参考模型.
為瞭改善從Web上穫取的新聞信息的使用價值,針對Web網站存在大量非科技相關新聞的現狀,以互聯網上政府新聞網站、鳳凰網等新聞為研究揹景,選取TF-IDF文本加權方法,設計瞭科技新聞多層次二分類模型,實現瞭基于TF-IDF的科技新聞文本分類抽取繫統,在20萬新聞文檔和4000多種分類上,實驗取得瞭科技新聞85.3%的識彆準確率和非科技新聞82.9%的識彆率,為Web科技新聞分類抽取提供有實用價值的參攷模型.
위료개선종Web상획취적신문신식적사용개치,침대Web망참존재대량비과기상관신문적현상,이호련망상정부신문망참、봉황망등신문위연구배경,선취TF-IDF문본가권방법,설계료과기신문다층차이분류모형,실현료기우TF-IDF적과기신문문본분류추취계통,재20만신문문당화4000다충분류상,실험취득료과기신문85.3%적식별준학솔화비과기신문82.9%적식별솔,위Web과기신문분류추취제공유실용개치적삼고모형.
There are a lot of non-scientific-related news on Websites.In order to improve the useful value for the news information, a novel multilevel dichotomous model of text automatic categorization extraction system for technology news based on TF-IDF was designed and implemented.The news offered by government news web-site and Phoenix as the research background in scientific news categorization extraction.Experiments showed a 85 .3 percent accuracy for scientific-related news and 82 .9 percent recognition rate for nonscientific-related news respectively in the test containing two hundred thousand documents and more than four thousand news clas-sifications.The results showed that the proposed method offered a useful reference model on website scientific intelligence.