计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2014年
19期
199-204
,共6页
We b信息抽取%林产品贸易语义词典%语义信息熵%模板%目标信息定位
We b信息抽取%林產品貿易語義詞典%語義信息熵%模闆%目標信息定位
We b신식추취%림산품무역어의사전%어의신식적%모판%목표신식정위
Web information extraction%forest product trade semantic dictionary%semantic information entropy%template%target information location
针对现有We b信息抽取技术存在的准确率不高,自动化程度较低以及通用性较弱等诸多不足,结合林产品贸易We b信息推送中对信息源进行结构化存储的需要,提出一种新的基于语义的林产品贸易We b信息抽取算法;充分分析并利用林产品贸易We b信息的特征,结合语义识别的基本原理,构建林产品贸易语义词典,同时利用所需抽取的目标信息在网页中的布局特征,结合信息熵理论提出了基于语义信息熵的目标信息自动定位抽取方法,以抽取需要的目标信息,并以一种结构化的形式存储于数据库中。通过实验对实际林产品贸易We b信息网页的抽取,证明了该算法能够降低人工干预,在林产品贸易信息推送中对信息源的处理具有较好的应用价值。
針對現有We b信息抽取技術存在的準確率不高,自動化程度較低以及通用性較弱等諸多不足,結閤林產品貿易We b信息推送中對信息源進行結構化存儲的需要,提齣一種新的基于語義的林產品貿易We b信息抽取算法;充分分析併利用林產品貿易We b信息的特徵,結閤語義識彆的基本原理,構建林產品貿易語義詞典,同時利用所需抽取的目標信息在網頁中的佈跼特徵,結閤信息熵理論提齣瞭基于語義信息熵的目標信息自動定位抽取方法,以抽取需要的目標信息,併以一種結構化的形式存儲于數據庫中。通過實驗對實際林產品貿易We b信息網頁的抽取,證明瞭該算法能夠降低人工榦預,在林產品貿易信息推送中對信息源的處理具有較好的應用價值。
침대현유We b신식추취기술존재적준학솔불고,자동화정도교저이급통용성교약등제다불족,결합림산품무역We b신식추송중대신식원진행결구화존저적수요,제출일충신적기우어의적림산품무역We b신식추취산법;충분분석병이용림산품무역We b신식적특정,결합어의식별적기본원리,구건림산품무역어의사전,동시이용소수추취적목표신식재망혈중적포국특정,결합신식적이론제출료기우어의신식적적목표신식자동정위추취방법,이추취수요적목표신식,병이일충결구화적형식존저우수거고중。통과실험대실제림산품무역We b신식망혈적추취,증명료해산법능구강저인공간예,재림산품무역신식추송중대신식원적처리구유교호적응용개치。
Based on the shortages of the existing Web information extraction technique in the presence of the accuracy is not high, a low degree of automation and the weaker commonality, combined with the structured storage needs of information source in forest products trade Web information push, a new algorithm on forest products trading Web messages structuring based on semantic is proposed. The paper analyzes and takes advantage of forest products trade Web information feature, and combined with the basic principle of semantic recognition, it constructs of the forest product trade semantic dictionary, uses the layout features of the target information that need to extract in the Web pages at the same time and combined with the information entropy theory, a method of target information automatic extraction based on the semantic information entropy is proposed to extract target information, and the information is stored in the database as a structured form. The experiments on actual forest product trade Web pages information extraction, prove that this algorithm can reduce manual intervention and has good value in processing information source in forest products trade information push.