计算机应用与软件
計算機應用與軟件
계산궤응용여연건
COMPUTER APPLICATIONS AND SOFTWARE
2014年
3期
16-19,30
,共5页
模糊 SVDD%PageRank%主题爬虫
模糊 SVDD%PageRank%主題爬蟲
모호 SVDD%PageRank%주제파충
Fuzzy SVDD%PageRank%Focused crawler
主题爬虫是收集特定领域资源的网络爬虫。为了保证主题爬虫的查准率,提出一种基于模糊 SVDD(support vector do-main description)监督的 PageRank 爬虫算法,既考虑网页间的链接关系,又使用合适的分类器监督来保证爬虫与主题不偏离。通过与关键词匹配主题爬虫、shark-search 主题爬虫、PageRank 主题爬虫、基于 SVM预测的主题爬虫、普通 SVDD 指导的主题爬虫进行试验对比,验证了该算法具有更高的查准率。
主題爬蟲是收集特定領域資源的網絡爬蟲。為瞭保證主題爬蟲的查準率,提齣一種基于模糊 SVDD(support vector do-main description)鑑督的 PageRank 爬蟲算法,既攷慮網頁間的鏈接關繫,又使用閤適的分類器鑑督來保證爬蟲與主題不偏離。通過與關鍵詞匹配主題爬蟲、shark-search 主題爬蟲、PageRank 主題爬蟲、基于 SVM預測的主題爬蟲、普通 SVDD 指導的主題爬蟲進行試驗對比,驗證瞭該算法具有更高的查準率。
주제파충시수집특정영역자원적망락파충。위료보증주제파충적사준솔,제출일충기우모호 SVDD(support vector do-main description)감독적 PageRank 파충산법,기고필망혈간적련접관계,우사용합괄적분류기감독래보증파충여주제불편리。통과여관건사필배주제파충、shark-search 주제파충、PageRank 주제파충、기우 SVM예측적주제파충、보통 SVDD 지도적주제파충진행시험대비,험증료해산법구유경고적사준솔。
Focused crawler is a web crawler to collect resources from specific fields.In order to ensure focused crawler's precision,the arti-cle proposes a PageRank crawler algorithm based on fuzzy SVDD(support vector domain description)supervision,which not only considers the linking relations among pages,but also uses classifier supervision to prevent crawler from departing from focus.Compared by experiments with keyword matching focused crawler,shark-search focused crawler,PageRank focused crawler,SVMprediction based focused crawler and ordinary SVDD instructed focused crawler,it is validated that the proposed algorithm is more precise.