昆明学院学报
昆明學院學報
곤명학원학보
JOURNAL OF KUNMING UNIVERSITY
2011年
6期
64-66
,共3页
邱莎%段玻%申浩如%丁海燕
邱莎%段玻%申浩如%丁海燕
구사%단파%신호여%정해연
命名实体识别%中文人名识别%条件随机场%条件概率%特征模板%序列标注
命名實體識彆%中文人名識彆%條件隨機場%條件概率%特徵模闆%序列標註
명명실체식별%중문인명식별%조건수궤장%조건개솔%특정모판%서렬표주
named entity recognition%Chinese people' s name recognition%conditional random fields%conditional probability%feature template%sequence labeling
利用条件随机场能够任意添加特征的优点,基于条件随机场模型在字粒度一级进行中文人名识别的研究.根据中文人名在文本中出现的基本特征和上下文特征,结合模型的综合性能,合理构造条件随机场的特征模板,在大规模标注语料上进行训练,统计中文人名在文本中的条件概率分布,获得模型参数,并采用序列标注的方式完成中文人名识别任务.多次闭合测试和开放测试结果表明,F值基本都高于90%.
利用條件隨機場能夠任意添加特徵的優點,基于條件隨機場模型在字粒度一級進行中文人名識彆的研究.根據中文人名在文本中齣現的基本特徵和上下文特徵,結閤模型的綜閤性能,閤理構造條件隨機場的特徵模闆,在大規模標註語料上進行訓練,統計中文人名在文本中的條件概率分佈,穫得模型參數,併採用序列標註的方式完成中文人名識彆任務.多次閉閤測試和開放測試結果錶明,F值基本都高于90%.
이용조건수궤장능구임의첨가특정적우점,기우조건수궤장모형재자립도일급진행중문인명식별적연구.근거중문인명재문본중출현적기본특정화상하문특정,결합모형적종합성능,합리구조조건수궤장적특정모판,재대규모표주어료상진행훈련,통계중문인명재문본중적조건개솔분포,획득모형삼수,병채용서렬표주적방식완성중문인명식별임무.다차폐합측시화개방측시결과표명,F치기본도고우90%.
Taking advantage of the ability of using arbitrary features as input in CRFs, the task of the name of Chinese people recognition was discussed based on the Conditional Random Fields on the character level. According to the basic and context features of the Chinese people's names, the feature template of CRFs was built reasonably combining the comprehensive performance of the model which was trained on the large scale corpus to obtain the model's parameters by counting the Chinese names' conditional probability distribution in the texts. By sequence labeling, it implemented the recognition of Chinese names. It obtained promising results on different closed and opened test corpus with the F measurement value of almost 90% above.