北京中医药大学学报
北京中醫藥大學學報
북경중의약대학학보
Journal of Beijing University of Traditional Chinese Medicine
2015年
9期
587-590
,共4页
孟洪宇%谢晴宇%常虹%孟庆刚
孟洪宇%謝晴宇%常虹%孟慶剛
맹홍우%사청우%상홍%맹경강
中医术语%条件随机场%伤寒论%自动识别
中醫術語%條件隨機場%傷寒論%自動識彆
중의술어%조건수궤장%상한론%자동식별
TCM terminology%conditional random fields%ShangHan Lun%automatic identification
目的:探索中医术语的自动识别方法,扩充中医文本的自然语言处理形式。方法采用基于条件随机场( CRF)的方法,针对《伤寒论》文本中的症状、病名、脉象、方剂等中医术语的自动识别标注问题,通过结合字本身、词性、词边界、术语类别标注的特征,分析不同特征组合对术语识别的影响,并探讨最具有效性的组合。结果以字本身、词边界、词性、类别标签为特征组合的中医术语识别模型准确率为85.00%,召回率为68.00%,F值为75.56%。结论字本身、词性、词边界、术语类别标注的多特征融合的模型识别效果最优。
目的:探索中醫術語的自動識彆方法,擴充中醫文本的自然語言處理形式。方法採用基于條件隨機場( CRF)的方法,針對《傷寒論》文本中的癥狀、病名、脈象、方劑等中醫術語的自動識彆標註問題,通過結閤字本身、詞性、詞邊界、術語類彆標註的特徵,分析不同特徵組閤對術語識彆的影響,併探討最具有效性的組閤。結果以字本身、詞邊界、詞性、類彆標籤為特徵組閤的中醫術語識彆模型準確率為85.00%,召迴率為68.00%,F值為75.56%。結論字本身、詞性、詞邊界、術語類彆標註的多特徵融閤的模型識彆效果最優。
목적:탐색중의술어적자동식별방법,확충중의문본적자연어언처리형식。방법채용기우조건수궤장( CRF)적방법,침대《상한론》문본중적증상、병명、맥상、방제등중의술어적자동식별표주문제,통과결합자본신、사성、사변계、술어유별표주적특정,분석불동특정조합대술어식별적영향,병탐토최구유효성적조합。결과이자본신、사변계、사성、유별표첨위특정조합적중의술어식별모형준학솔위85.00%,소회솔위68.00%,F치위75.56%。결론자본신、사성、사변계、술어유별표주적다특정융합적모형식별효과최우。
Objective To explore the methods of automatic identification of TCM terminology and to ex-pand the forms of natural language processing in TCM documents.Methods Based on the methods of conditional random field( CRF) , annotation and automatic identification on terms of symptoms, diseases, pulse-types and prescriptions recorded in Shanghan Lun as the research subjects, the effects of different combinations of the features, such as Chinese character itself, part of speech, word boundary and term category label, on identification of terminology were analyzed and the most effective combination was selected.Results The TCM terminology automatic identification model, combining with the features of Chinese character itself, part of speech, word boundary and term category label, had the precision of 85.00%, recall of 68.00%and F score of 75.56%.Conclusion The multi-features model of combi-nation of Chinese character itself, part of speech, word boundary and the term category label achieved the best identifying result in all combinations.