情报学报
情報學報
정보학보
2014年
8期
836-843
,共8页
自动标引%副主题词%分面检索
自動標引%副主題詞%分麵檢索
자동표인%부주제사%분면검색
automatic indexing%subheadings%faceted search
本文研究了科技文献副主题词的自动抽取问题,并对其在分面检索中的应用进行了探索。为实现副主题词的自动标引,本文提出了以标题作为抽取数据源、基于规则进行抽取的实现方法,并以图情领域文献进行了实验。结果显示,基于规则的抽取方法在召回率和准确率方面表现良好,均超过了90%;但仅以标题作为抽取数据源会导致召回率偏低,仅有49.9%的文献能抽取出副主题词。为探索副主题词在分面检索中的应用,本文以图情领域文献为例构建了原型系统,从使用效果来,副主题词作为独立的检索点价值不大,但和其他检索点配合使用则可以更贴切地表达用户需求,作为分面则能在帮助用户进行探索式检索以及结果筛选方面发挥重要作用。本研究的局限性包括仅采用标题作为副主题词抽取数据源,导致召回率不高;在副主题词抽取时未考虑同时抽取相应的主题词等。
本文研究瞭科技文獻副主題詞的自動抽取問題,併對其在分麵檢索中的應用進行瞭探索。為實現副主題詞的自動標引,本文提齣瞭以標題作為抽取數據源、基于規則進行抽取的實現方法,併以圖情領域文獻進行瞭實驗。結果顯示,基于規則的抽取方法在召迴率和準確率方麵錶現良好,均超過瞭90%;但僅以標題作為抽取數據源會導緻召迴率偏低,僅有49.9%的文獻能抽取齣副主題詞。為探索副主題詞在分麵檢索中的應用,本文以圖情領域文獻為例構建瞭原型繫統,從使用效果來,副主題詞作為獨立的檢索點價值不大,但和其他檢索點配閤使用則可以更貼切地錶達用戶需求,作為分麵則能在幫助用戶進行探索式檢索以及結果篩選方麵髮揮重要作用。本研究的跼限性包括僅採用標題作為副主題詞抽取數據源,導緻召迴率不高;在副主題詞抽取時未攷慮同時抽取相應的主題詞等。
본문연구료과기문헌부주제사적자동추취문제,병대기재분면검색중적응용진행료탐색。위실현부주제사적자동표인,본문제출료이표제작위추취수거원、기우규칙진행추취적실현방법,병이도정영역문헌진행료실험。결과현시,기우규칙적추취방법재소회솔화준학솔방면표현량호,균초과료90%;단부이표제작위추취수거원회도치소회솔편저,부유49.9%적문헌능추취출부주제사。위탐색부주제사재분면검색중적응용,본문이도정영역문헌위례구건료원형계통,종사용효과래,부주제사작위독립적검색점개치불대,단화기타검색점배합사용칙가이경첩절지표체용호수구,작위분면칙능재방조용호진행탐색식검색이급결과사선방면발휘중요작용。본연구적국한성포괄부채용표제작위부주제사추취수거원,도치소회솔불고;재부주제사추취시미고필동시추취상응적주제사등。
This paper investigates the automatic indexing of subheadings and their application in faceted search. To implement subheading automatic indexing,this paper proposed a method which uses title as the extraction data source and extracts subheadings based on rules,and an experiment is done on library and information science literature to test its effect.The results show that the extraction method based on rules performs well whose recall and precision both exceed 90%;only using title as the extraction data source results in a low level of the recall ,for only 49.9% of the literatures can be extracted subheadings.To explore subheadings application in faceted search,, this paper constructs a prototype system for library and information science,and it can be concluded that subheading as an independent search point has little value,but with other access points they can express user needs more aptly;as a facet,it plays an important role in helping users complete exploratory search and filter results.The limitation of this study includes only using title as data source of subheadings extraction,which results in low recall;without considering the extraction of corresponding subject headings while extracting subheadings,etc.