浙江大学学报(工学版)
浙江大學學報(工學版)
절강대학학보(공학판)
JOURNAL OF ZHEJIANG UNIVERSITY(ENGINEERING SCIENCE)
2015年
4期
717-723,775
,共8页
张晗%罗森林%邹丽丽%石秀民
張晗%囉森林%鄒麗麗%石秀民
장함%라삼림%추려려%석수민
人名消歧%句义结构模型%句义分析%自然语言处理
人名消歧%句義結構模型%句義分析%自然語言處理
인명소기%구의결구모형%구의분석%자연어언처리
personal name disambiguation%sentential semantic model%sentential semantic analysis%natural language processing
在构造文本特征空间的基础上,提出融合句义分析的三阶段人名消歧方法.该方法针对查询词常作为普通词出现的特点,在文本预处理后采用启发式规则的后处理方法判断查询词是否指人名,根据特征模板提取局部名实体特征及职业.通过句义结构模型进行句义分析,提取句义特征,利用词袋模型统计词频,构成三层特征空间,使用基于规则的分类和两阶段层次聚类算法实现人名消歧.引入重叠系数计算句义特征相似度,在CLP2012中文人名消歧语料上进行实验,F达到88.79%,证明了将句义分析应用到跨文本人名消歧的效果良好.
在構造文本特徵空間的基礎上,提齣融閤句義分析的三階段人名消歧方法.該方法針對查詢詞常作為普通詞齣現的特點,在文本預處理後採用啟髮式規則的後處理方法判斷查詢詞是否指人名,根據特徵模闆提取跼部名實體特徵及職業.通過句義結構模型進行句義分析,提取句義特徵,利用詞袋模型統計詞頻,構成三層特徵空間,使用基于規則的分類和兩階段層次聚類算法實現人名消歧.引入重疊繫數計算句義特徵相似度,在CLP2012中文人名消歧語料上進行實驗,F達到88.79%,證明瞭將句義分析應用到跨文本人名消歧的效果良好.
재구조문본특정공간적기출상,제출융합구의분석적삼계단인명소기방법.해방법침대사순사상작위보통사출현적특점,재문본예처리후채용계발식규칙적후처리방법판단사순사시부지인명,근거특정모판제취국부명실체특정급직업.통과구의결구모형진행구의분석,제취구의특정,이용사대모형통계사빈,구성삼층특정공간,사용기우규칙적분류화량계단층차취류산법실현인명소기.인입중첩계수계산구의특정상사도,재CLP2012중문인명소기어료상진행실험,F체도88.79%,증명료장구의분석응용도과문본인명소기적효과량호.
A multi‐stage disambiguation algorithm was proposed based on the construction of text feature space .According to the characteristics of query terms often occurring as common terms ,heuristic rule was applied to determine if the query term is personal name after the pre‐processing of documents .Then named entity and occupation were extracted according to the feature templates .The sentential semantic model was used for sentential semantic analysis and sentential semantic features extraction .The word frequency was counted according to the bag‐of‐words model .Then the three layers of feature space were constructed . The rule‐based classification and two‐stage hierarchical clustering algorithm was used to realize the name disambiguation .The overlap coefficient was introduced to compute the similarity of the sentential semantic features .The experiments datasets built by CLP2012 Chinese Personal Name disambiguation showed that F achieved 88 .79% , w hich proved that the proposed approach can improve the performance of cross‐document personal name disambiguation .