上海交通大学学报(英文版)
上海交通大學學報(英文版)
상해교통대학학보(영문판)
JOURNAL OF SHANGHAL JIAOTONG UNIVERSITY
2002年
1期
15-22
,共8页
data mining%web mining%web usage mining%log analysis%interestingness enhancement
Improvement on mining the frequently visited groups of web pages was studied. First, in the data preprocessing phrase, we introduce an extra frame-filtering step that reduces the negative influence of frame pages on the result page groups. Through recognizing the frame pages in the site documents and constructing the frame-subframe relation set, the subframe pages that influence the final mining result can be efficiently filtered. Second, we enhance the mining algorithm with the consideration of both the site topology and the content of the web pages. By the introduction of the normalized content-link ratio of the web page and the group interlink degree of the page group, the enhanced algorithm concentrates more on the content pages that are less interlinked together. The experiments show that the new approach can effectively reveal more interesting page groups, which would not be found without these enhancements.