大连理工大学学报
大連理工大學學報
대련리공대학학보
JOURNAL OF DALIAN UNIVERSITY OF TECHNOLOGY
2010年
2期
291-295
,共5页
正则表达式%元字符%生成器%中国文化术语
正則錶達式%元字符%生成器%中國文化術語
정칙표체식%원자부%생성기%중국문화술어
regular expression%meta-character%generating engine%Chinese cultural terms
运用正则表达式的字符串匹配功能对特定数据库中的汉英对照中国文化术语进行了抽取.抽取过程中,由于规则中特殊字符有11个,正则表达式中的一个字符可能要经过11次才能判断与待搜索文本中对应字符是否匹配.为加快抽取速度,根据待搜索文本的实际情况,选择使用了3个元字符,建立了符合特定需要的正则表达式,在保证相同正确率的前提下,抽取速度提高了1倍左右;同时,通过正则表达式生成器,尝试解决了正则表达式应用过程中可读性差、用户使用难度大的问题.
運用正則錶達式的字符串匹配功能對特定數據庫中的漢英對照中國文化術語進行瞭抽取.抽取過程中,由于規則中特殊字符有11箇,正則錶達式中的一箇字符可能要經過11次纔能判斷與待搜索文本中對應字符是否匹配.為加快抽取速度,根據待搜索文本的實際情況,選擇使用瞭3箇元字符,建立瞭符閤特定需要的正則錶達式,在保證相同正確率的前提下,抽取速度提高瞭1倍左右;同時,通過正則錶達式生成器,嘗試解決瞭正則錶達式應用過程中可讀性差、用戶使用難度大的問題.
운용정칙표체식적자부천필배공능대특정수거고중적한영대조중국문화술어진행료추취.추취과정중,유우규칙중특수자부유11개,정칙표체식중적일개자부가능요경과11차재능판단여대수색문본중대응자부시부필배.위가쾌추취속도,근거대수색문본적실제정황,선택사용료3개원자부,건립료부합특정수요적정칙표체식,재보증상동정학솔적전제하,추취속도제고료1배좌우;동시,통과정칙표체식생성기,상시해결료정칙표체식응용과정중가독성차、용호사용난도대적문제.
The matching system of the character string in regular expression (RE) is used to extract the Chinese cultural terms and their correspondent English translations from the specialized corpus.During the process of extraction, if the current RE is used, then 11 special characters would appear in the expressions. It means that a particular character in RE has to go through 11 judgments so as to make sure whether it matches the correspondent character in the to-be-searched text or not. To speed up extracting process, the target-oriented regular expressions are designed to fit the pattern of the to-be-searched text by reducing the number of recta-characters from 11 to 3. Experimental results show that processing speed is doubled while accuracy is maintained. At the same time, the generating engine of regular expressions is designed to improve the readability of RE and decrease the difficulty of its application.