计算机工程
計算機工程
계산궤공정
COMPUTER ENGINEERING
2010年
7期
66-67,70
,共3页
深层网%入口查询%表单填充
深層網%入口查詢%錶單填充
심층망%입구사순%표단전충
deep Web%entrance query%form filling
针对深层网中数据量大导致无法被传统搜索引擎索引的问题,在提取网页中,改进启发式规则识别表单查询入口,在表单标签与内容匹配时,改进基于语义的相似度匹配算法进行表单内容填充.实验结果表明,提取表单标签的准确率达到94.23%,匹配成功率达到88.83%,填充成功率达到95.43%.
針對深層網中數據量大導緻無法被傳統搜索引擎索引的問題,在提取網頁中,改進啟髮式規則識彆錶單查詢入口,在錶單標籤與內容匹配時,改進基于語義的相似度匹配算法進行錶單內容填充.實驗結果錶明,提取錶單標籤的準確率達到94.23%,匹配成功率達到88.83%,填充成功率達到95.43%.
침대심층망중수거량대도치무법피전통수색인경색인적문제,재제취망혈중,개진계발식규칙식별표단사순입구,재표단표첨여내용필배시,개진기우어의적상사도필배산법진행표단내용전충.실험결과표명,제취표단표첨적준학솔체도94.23%,필배성공솔체도88.83%,전충성공솔체도95.43%.
Aiming at the problem that large data in deep Web can not be indexed by traditional searching engine, this paper uses an improved heuristic rules to identify entrance query of form in extractive Web pages. It adopts improved similarity matching algorithm based on semantic to fill form content when form tag matching with content. Experimental results show that the veracity rate of extracted form label is 94.23%, success rate of the matching is 88.83% and filling form control is 95.43%.