计算机与现代化
計算機與現代化
계산궤여현대화
COMPUTER AND MODERNIZATION
2009年
8期
73-75,79
,共4页
网页分块%块重要性权重%Xpath%Web信息抽取
網頁分塊%塊重要性權重%Xpath%Web信息抽取
망혈분괴%괴중요성권중%Xpath%Web신식추취
page segment%value of block importance%Xpath%Web information extraction
网页分块方法使得Web信息抽取的单位由页面缩小为块.文中研究了网页分块的主要方法与基于学习的分块重要性模型,对Xpath的Web抽取方法进行分析.结合两者的优势提出一种基于分块重要性模型与Xpath结合的Web信息抽取方法,探讨了其设计过程,并给出形式化描述与实验结果,结果表明该方法适合于抽取多记录型的网页.
網頁分塊方法使得Web信息抽取的單位由頁麵縮小為塊.文中研究瞭網頁分塊的主要方法與基于學習的分塊重要性模型,對Xpath的Web抽取方法進行分析.結閤兩者的優勢提齣一種基于分塊重要性模型與Xpath結閤的Web信息抽取方法,探討瞭其設計過程,併給齣形式化描述與實驗結果,結果錶明該方法適閤于抽取多記錄型的網頁.
망혈분괴방법사득Web신식추취적단위유혈면축소위괴.문중연구료망혈분괴적주요방법여기우학습적분괴중요성모형,대Xpath적Web추취방법진행분석.결합량자적우세제출일충기우분괴중요성모형여Xpath결합적Web신식추취방법,탐토료기설계과정,병급출형식화묘술여실험결과,결과표명해방법괄합우추취다기록형적망혈.
Approaches of page segment reduce the unit of Web information extraction from page to block. This paper studies the main approaches of page segment and the based-learning block importance model, and analyses the approach of Xpath-based Web information extraction. Combining the advantages of the two approaches, this paper proposes a new Web information extraction based on combining block importance model and Xpath, discusses its design process, and gives its formalized description and experimental result. The result shows that this approach is fit for extracting from the Web which has many records.