CAJ | 학술논문

搜索引擎排序作弊通过提高网页与搜索请求的相关性, 达到提高搜索排名的目的. 为此, 根据作弊网页的特征, 引入作弊倾向系数这一概念来衡量网页作弊的可能性. 网页作弊通过多种手段实现, 鉴于此本文基于网页内容本身的名词密度特征, 衡量页面内容作弊的可能性, 由于搜索关键词大部分为名词, 超过一定名词比例阈值的页面, 其内容作弊的可能性越大. 根据页面的链接特征, 衡量页面链接作弊的可能性, 从黑名单页面通过迭代计算链接作弊系数, 并根据与黑名单页面的距离设置权重. 最终从上述两方面特征来综合考量页面的作弊倾向系数. 选取PageRank, TrustRank, BadRank为基线实验, 实验结果验证了关于检索词性分析的假设以及链接作弊检测算法的有效性.
수색인경배서작폐통과제고망혈여수색청구적상관성, 체도제고수색배명적목적. 위차, 근거작폐망혈적특정, 인입작폐경향계수저일개념래형량망혈작폐적가능성. 망혈작폐통과다충수단실현, 감우차본문기우망혈내용본신적명사밀도특정, 형량혈면내용작폐적가능성, 유우수색관건사대부분위명사, 초과일정명사비례역치적혈면, 기내용작폐적가능성월대. 근거혈면적련접특정, 형량혈면련접작폐적가능성, 종흑명단혈면통과질대계산련접작폐계수, 병근거여흑명단혈면적거리설치권중. 최종종상술량방면특정래종합고량혈면적작폐경향계수. 선취PageRank, TrustRank, BadRank위기선실험, 실험결과험증료관우검색사성분석적가설이급련접작폐검측산법적유효성.
By improving the relevance of web pages and search requests, the search engines sort spam achieves the purposes of improving search ranking. Hence, according to the characteristics of the cheating pages, the paper introduces the concept of spam tendency rate to measure the possibility of a web spam behavior. Web spam may be achieved through a variety of channels, based on nouns density, it measures content spam tendency rate. Because majority search keywords are nouns, so the greater a page exceeds a certain proportion of nouns threshold, the greater the possibility of spam. Based on link characteristics, it measures link spam tendency rate. The paper calculates link spam tendency rate by iteration from the blacklist page, then sets the weight in accordance with the distance from the blacklist page. Finally, from these both aspects to comprehensive considerate the spam tendency rate of a page. By selecting PageRank, TrustRank, BadRank as baseline, the experimental results verify the assumptions of the part-of-speech on keywords and the effectiveness of link spam detection.