计算机技术与发展
計算機技術與髮展
계산궤기술여발전
COMPUTER TECHNOLOGY AND DEVELOPMENT
2013年
4期
135-138
,共4页
文本信息%三层过滤%向量空间模型%主题倾向
文本信息%三層過濾%嚮量空間模型%主題傾嚮
문본신식%삼층과려%향량공간모형%주제경향
text information%three-layer filtration%vector-space model%thematic tendency
为了提高文本信息过滤的效率,提出一种基于文本信息的三层过滤系统.系统分为横向二部分、纵向三层次的结构,在信息过滤时第一层采用基于IP、URL地址的过滤方式;第二层为关键词频与权重的统计,对信息标题、关键词及正文内容三部分分别计算统计值;第三层为内容特征分析过滤,同时引入分词、关键词权重计算、VSM与主题倾向分析技术,保证不良信息识别的高效与准确.实验表明系统具有较好的过滤效果,查全率和查准率明显优于KNN方法,在实时信息过滤时能及时阻止不良信息的传播.
為瞭提高文本信息過濾的效率,提齣一種基于文本信息的三層過濾繫統.繫統分為橫嚮二部分、縱嚮三層次的結構,在信息過濾時第一層採用基于IP、URL地阯的過濾方式;第二層為關鍵詞頻與權重的統計,對信息標題、關鍵詞及正文內容三部分分彆計算統計值;第三層為內容特徵分析過濾,同時引入分詞、關鍵詞權重計算、VSM與主題傾嚮分析技術,保證不良信息識彆的高效與準確.實驗錶明繫統具有較好的過濾效果,查全率和查準率明顯優于KNN方法,在實時信息過濾時能及時阻止不良信息的傳播.
위료제고문본신식과려적효솔,제출일충기우문본신식적삼층과려계통.계통분위횡향이부분、종향삼층차적결구,재신식과려시제일층채용기우IP、URL지지적과려방식;제이층위관건사빈여권중적통계,대신식표제、관건사급정문내용삼부분분별계산통계치;제삼층위내용특정분석과려,동시인입분사、관건사권중계산、VSM여주제경향분석기술,보증불량신식식별적고효여준학.실험표명계통구유교호적과려효과,사전솔화사준솔명현우우KNN방법,재실시신식과려시능급시조지불량신식적전파.
In order to improve the efficiency of text information filtering,a system of three-layer filtration based on text message is put forward. The system is divided into horizontal two parts and vertical three-tier structure,the first layer of information filtering is based on IP and URL address filtering,the second layer is based on the statistics of keyword frequency and weights,including information title, keywords and text content three parts to calculate the statistical value. The third layer is based on analysis of filter content features,while the split words,keywords weighting,VSM and theme tendency analysis is led in the system,to ensure the efficiency and accuracy of the bad information to identify. The experiments are shown that the system has a better filtering effect of the recall and precision significantly than the KNN method,timely to prevent the spread of bad information in real time information filtering.