系统工程理论与实践
繫統工程理論與實踐
계통공정이론여실천
Systems Engineering—Theory & Practice
2015年
2期
445~457
,共null页
搜索引擎 搜索引擎优化 网页排序 排名作弊 文本内容 链接结构
搜索引擎 搜索引擎優化 網頁排序 排名作弊 文本內容 鏈接結構
수색인경 수색인경우화 망혈배서 배명작폐 문본내용 련접결구
search engine; search engine optimization; page ranking; ranking spam; text content; link structure
搜索引擎排序作弊通过提高网页与搜索请求的相关性, 达到提高搜索排名的目的. 为此, 根据作弊网页的特征, 引入作弊倾向系数这一概念来衡量网页作弊的可能性. 网页作弊通过多种手段实现, 鉴于此本文基于网页内容本身的名词密度特征, 衡量页面内容作弊的可能性, 由于搜索关键词大部分为名词, 超过一定名词比例阈值的页面, 其内容作弊的可能性越大. 根据页面的链接特征, 衡量页面链接作弊的可能性, 从黑名单页面通过迭代计算链接作弊系数, 并根据与黑名单页面的距离设置权重. 最终从上述两方面特征来综合考量页面的作弊倾向系数. 选取PageRank, TrustRank, BadRank为基线实验, 实验结果验证了关于检索词性分析的假设以及链接作弊检测算法的有效性.
搜索引擎排序作弊通過提高網頁與搜索請求的相關性, 達到提高搜索排名的目的. 為此, 根據作弊網頁的特徵, 引入作弊傾嚮繫數這一概唸來衡量網頁作弊的可能性. 網頁作弊通過多種手段實現, 鑒于此本文基于網頁內容本身的名詞密度特徵, 衡量頁麵內容作弊的可能性, 由于搜索關鍵詞大部分為名詞, 超過一定名詞比例閾值的頁麵, 其內容作弊的可能性越大. 根據頁麵的鏈接特徵, 衡量頁麵鏈接作弊的可能性, 從黑名單頁麵通過迭代計算鏈接作弊繫數, 併根據與黑名單頁麵的距離設置權重. 最終從上述兩方麵特徵來綜閤攷量頁麵的作弊傾嚮繫數. 選取PageRank, TrustRank, BadRank為基線實驗, 實驗結果驗證瞭關于檢索詞性分析的假設以及鏈接作弊檢測算法的有效性.
수색인경배서작폐통과제고망혈여수색청구적상관성, 체도제고수색배명적목적. 위차, 근거작폐망혈적특정, 인입작폐경향계수저일개념래형량망혈작폐적가능성. 망혈작폐통과다충수단실현, 감우차본문기우망혈내용본신적명사밀도특정, 형량혈면내용작폐적가능성, 유우수색관건사대부분위명사, 초과일정명사비례역치적혈면, 기내용작폐적가능성월대. 근거혈면적련접특정, 형량혈면련접작폐적가능성, 종흑명단혈면통과질대계산련접작폐계수, 병근거여흑명단혈면적거리설치권중. 최종종상술량방면특정래종합고량혈면적작폐경향계수. 선취PageRank, TrustRank, BadRank위기선실험, 실험결과험증료관우검색사성분석적가설이급련접작폐검측산법적유효성.
By improving the relevance of web pages and search requests, the search engines sort spam achieves the purposes of improving search ranking. Hence, according to the characteristics of the cheating pages, the paper introduces the concept of spam tendency rate to measure the possibility of a web spam behavior. Web spam may be achieved through a variety of channels, based on nouns density, it measures content spam tendency rate. Because majority search keywords are nouns, so the greater a page exceeds a certain proportion of nouns threshold, the greater the possibility of spam. Based on link characteristics, it measures link spam tendency rate. The paper calculates link spam tendency rate by iteration from the blacklist page, then sets the weight in accordance with the distance from the blacklist page. Finally, from these both aspects to comprehensive considerate the spam tendency rate of a page. By selecting PageRank, TrustRank, BadRank as baseline, the experimental results verify the assumptions of the part-of-speech on keywords and the effectiveness of link spam detection.