情报工程
情報工程
정보공정
Technology Intelligence Engineering
2015年
2期
64-72
,共9页
张秋子%陆伟%程齐凯%黄永
張鞦子%陸偉%程齊凱%黃永
장추자%륙위%정제개%황영
学术文本%缩写%机器学习%序列标注%信息抽取
學術文本%縮寫%機器學習%序列標註%信息抽取
학술문본%축사%궤기학습%서렬표주%신식추취
Academic texts%abbreviations/acronyms%machine learning sequence%labelling%information extraction
为实现海量英文学术文本中缩写词及对应缩写定义的识别,本文提出了一种自动缩写识别算法MELearn-AI。该算法在人工标注数据集的基础上,从序列标注的角度,通过最大熵模型实现了计算机领域英文学术文本中的自动缩写识别。MELearn-AI在本文构建的评测数据集“Paren-sen”上得到了95.8%的查准率和86.3%的查全率,相对于其他两组对照实验的效果有较为明显的提升。本文提出的自动缩写识别方法能够在计算机领域的学术文本上取得令人满意的效果,有助于更好地理解并利用该领域术语。
為實現海量英文學術文本中縮寫詞及對應縮寫定義的識彆,本文提齣瞭一種自動縮寫識彆算法MELearn-AI。該算法在人工標註數據集的基礎上,從序列標註的角度,通過最大熵模型實現瞭計算機領域英文學術文本中的自動縮寫識彆。MELearn-AI在本文構建的評測數據集“Paren-sen”上得到瞭95.8%的查準率和86.3%的查全率,相對于其他兩組對照實驗的效果有較為明顯的提升。本文提齣的自動縮寫識彆方法能夠在計算機領域的學術文本上取得令人滿意的效果,有助于更好地理解併利用該領域術語。
위실현해량영문학술문본중축사사급대응축사정의적식별,본문제출료일충자동축사식별산법MELearn-AI。해산법재인공표주수거집적기출상,종서렬표주적각도,통과최대적모형실현료계산궤영역영문학술문본중적자동축사식별。MELearn-AI재본문구건적평측수거집“Paren-sen”상득도료95.8%적사준솔화86.3%적사전솔,상대우기타량조대조실험적효과유교위명현적제승。본문제출적자동축사식별방법능구재계산궤영역적학술문본상취득령인만의적효과,유조우경호지리해병이용해영역술어。
In order to effectively identify the abbreviations and their corresponding deifnitions from enormous English academic texts, this paper proposes an automatic identification algorithm called MELearn-AI.In the perspective of the sequence labelling,MELearn-AI utilizes a manually labelled dataset and adopts maximum entropy algorithm to train a model, and then identify abbreviations in computer science academic texts based on the model. This method achieves a 95.8% precision rate with a 86.3% recall rate in the "Paren-sen" evaluation dataset created in this paper,it shows an obvious improvement compared to the other two algorithms.This paper proposes a method to identify the abbreviations and their corresponding deifnitions.Tested in English academic texts of computer science, the algorithm achieves satisfactory results, which is helpful to better understanding and adopting the terminology of this ifeld.