电子设计工程
電子設計工程
전자설계공정
ELECTRONIC DESIGN ENGINEERING
2014年
12期
1-4
,共4页
串匹配%中文%倒排索引%Bigram
串匹配%中文%倒排索引%Bigram
천필배%중문%도배색인%Bigram
string matching%Chinese%inverted index%Bigram
为通过构建高速的中文索引结构来提高Off-line模式的串匹配速度,提出了一种基于Bigram二级哈希的中文索引结构。该索引采用中文GB2312编码处理中文汉字,以中文Bigram项作为词汇项,并实现了基于二级哈希的词汇表存储结构。实验数据显示,本文索引结构虽然占用存储空间为词索引的2倍多,但其匹配速度是词索引的4倍多。结果表明本文索引在中文匹配中具有速度优势。
為通過構建高速的中文索引結構來提高Off-line模式的串匹配速度,提齣瞭一種基于Bigram二級哈希的中文索引結構。該索引採用中文GB2312編碼處理中文漢字,以中文Bigram項作為詞彙項,併實現瞭基于二級哈希的詞彙錶存儲結構。實驗數據顯示,本文索引結構雖然佔用存儲空間為詞索引的2倍多,但其匹配速度是詞索引的4倍多。結果錶明本文索引在中文匹配中具有速度優勢。
위통과구건고속적중문색인결구래제고Off-line모식적천필배속도,제출료일충기우Bigram이급합희적중문색인결구。해색인채용중문GB2312편마처리중문한자,이중문Bigram항작위사회항,병실현료기우이급합희적사회표존저결구。실험수거현시,본문색인결구수연점용존저공간위사색인적2배다,단기필배속도시사색인적4배다。결과표명본문색인재중문필배중구유속도우세。
In order to enhance off-line string matching speed by constructing a high speed index structure for Chinese, a new index structure based on Bigram and two level hashes is proposed in this paper. First, GB2312 code is empolyed to process Chinese and Bigrams are adopted as vocabulary terms in the new index. Second, a two level hashes scheme is designed as the structure of vocabulary. Experimental data shows that new index's matching speed is more than 4 times as against that of word index though its space consumption is more than 2 times as against that of word index. The results demonstrate that the new index has the advantage of speed in Chinese string matching.