科研信息化技术与应用
科研信息化技術與應用
과연신식화기술여응용
E-science Technology & Application
2013年
2期
59-66
,共8页
郭学兵%何洪林%唐新斋%苏文
郭學兵%何洪林%唐新齋%囌文
곽학병%하홍림%당신재%소문
叙词表%数据资源检索%叙词表管理信息系统%等级关系%等同关系%相关关系
敘詞錶%數據資源檢索%敘詞錶管理信息繫統%等級關繫%等同關繫%相關關繫
서사표%수거자원검색%서사표관리신식계통%등급관계%등동관계%상관관계
Thesaurus%Data resource retrieving%Thesaurus Management System%‘Broader and narrower’ relation%‘Used For’ relation%‘Related’ relation
为了更好地利用 CERN 数据管理与信息共享系统技术平台为广大科研人员提供 CERN 生态学数据资源服务,CERN 需要不断完善平台性能,其中包括提高用户搜索 CERN 数据资源的效率和可靠性。本文分析了导航式搜索、主题式搜索、关键词搜索等三种不同检索方式的优缺点,着重讨论了在关键词搜索方式中,如何引入叙词表的技术来提高检索结果的查全率、查准率和响应速度。本文介绍了叙词表的概念与 CERN 生态学叙词表的构建方法,以及如何将开源的叙词表管理系统 TemaTres 进行汉化,包括关键词浏览功能、关键词扩展功能、关键词自动填完功能、利用扩展后的关键词去搜索 CERN 生态学数据资源元数据功能的汉化实现过程。通过建设并运行 TemaTres 汉化版叙词表管理信息系统,增强了 CERN 生态学元数据中关键词编撰的可控性和规范性,并且在 CERN 数据资源元数据检索中引入了关键词之间的某些简单的语义关系,比如等级关系、等同关系(即同义词)、相关关系,从而改善了搜索效率,同时为下一步构建生态学本体打下良好基础。
為瞭更好地利用 CERN 數據管理與信息共享繫統技術平檯為廣大科研人員提供 CERN 生態學數據資源服務,CERN 需要不斷完善平檯性能,其中包括提高用戶搜索 CERN 數據資源的效率和可靠性。本文分析瞭導航式搜索、主題式搜索、關鍵詞搜索等三種不同檢索方式的優缺點,著重討論瞭在關鍵詞搜索方式中,如何引入敘詞錶的技術來提高檢索結果的查全率、查準率和響應速度。本文介紹瞭敘詞錶的概唸與 CERN 生態學敘詞錶的構建方法,以及如何將開源的敘詞錶管理繫統 TemaTres 進行漢化,包括關鍵詞瀏覽功能、關鍵詞擴展功能、關鍵詞自動填完功能、利用擴展後的關鍵詞去搜索 CERN 生態學數據資源元數據功能的漢化實現過程。通過建設併運行 TemaTres 漢化版敘詞錶管理信息繫統,增彊瞭 CERN 生態學元數據中關鍵詞編撰的可控性和規範性,併且在 CERN 數據資源元數據檢索中引入瞭關鍵詞之間的某些簡單的語義關繫,比如等級關繫、等同關繫(即同義詞)、相關關繫,從而改善瞭搜索效率,同時為下一步構建生態學本體打下良好基礎。
위료경호지이용 CERN 수거관리여신식공향계통기술평태위엄대과연인원제공 CERN 생태학수거자원복무,CERN 수요불단완선평태성능,기중포괄제고용호수색 CERN 수거자원적효솔화가고성。본문분석료도항식수색、주제식수색、관건사수색등삼충불동검색방식적우결점,착중토론료재관건사수색방식중,여하인입서사표적기술래제고검색결과적사전솔、사준솔화향응속도。본문개소료서사표적개념여 CERN 생태학서사표적구건방법,이급여하장개원적서사표관리계통 TemaTres 진행한화,포괄관건사류람공능、관건사확전공능、관건사자동전완공능、이용확전후적관건사거수색 CERN 생태학수거자원원수거공능적한화실현과정。통과건설병운행 TemaTres 한화판서사표관리신식계통,증강료 CERN 생태학원수거중관건사편찬적가공성화규범성,병차재 CERN 수거자원원수거검색중인입료관건사지간적모사간단적어의관계,비여등급관계、등동관계(즉동의사)、상관관계,종이개선료수색효솔,동시위하일보구건생태학본체타하량호기출。
In order to improve the capability of CERN data management and information sharing system ,so as to provide much more better services of CERN’s data resources to scientific researchers, CERN need to constantly improve the efifciency and reliability of retrieving and ifnding of the data resources. This paper discusses the advantages and disadvantages of different retrieving approaches, such as browse searching, searching by topic and searching by keywords. Then the paper puts focus on the method of improving retrieving efficiency through adopting “thesaurus” in searching by keywords. This paper introduces the concept of thesaurus and the method of constructing CERN’s thesaurus, presents how to convert the TemaTres, an open-source management system for thesaurus, into Chinese version. It includes the function of browsing the terms of CERN thesaurus, expanding the searched keywords according to the semantic relations between terms in the thesaurus database, auto-completing of keywords while users input their search words, and also the function of searching CERN metadata database by expanded terms. The Chinese version of TemaTres has been put into operation, it improves the suitability and controllability while CERN information managers compiling dataset keywords of metadata for CERN data resources, furthermore, some simple semantic relations between keywords, such as ‘Broader and Narrower’, ‘Used For’, and ‘Related’ relations, have been introduced to the process of searching CERN metadata. It is shown that the efifciency and reliability of searching CERN metadata has been promoted. Meanwhile, thesaurus is also a good foundation for building ecosystem ontology for the next step.