计算机学报
計算機學報
계산궤학보
CHINESE JOURNAL OF COMPUTERS
2014年
5期
1147-1158
,共12页
网络安全%超大规模特征匹配%串匹配%混合自动机%算法%信息安全
網絡安全%超大規模特徵匹配%串匹配%混閤自動機%算法%信息安全
망락안전%초대규모특정필배%천필배%혼합자동궤%산법%신식안전
network security%super large scale patterns matching%string matching%hybrid automaton%algorithm%information security
串匹配技术是入侵检测系统中的关键技术,随着特征数量的增加,现有的自动机类匹配算法都会面对内存占用过大的问题.当特征超过一定数目后,自动机可能根本无法构造.文中提出了一种针对超大规模特征匹配(SLSPM)环境的匹配算法SLSPM.SLSPM算法借助一个块式匹配自动机和若干个普通自动机完成匹配工作,而且能够支持至少上万规模的特征集.与普通匹配自动机先读入状态再判断读入符号的方式不同,SLSPM首先使用散列函数判断当前文本块是否可以被过滤掉.如果文本块无法被过滤且为合法文本块时,再检查当前状态是否是一个能够识别当前文本块的状态.仅在当前状态吻合的情况下再读入下一个文本块进行后续匹配.理论证明显示SLSPM算法具有近似O(n)的复杂度.由于SLSPM算法未能保存全部的跳转信息,其匹配速度相对于高级Aho-Corasick算法未有大幅提升.算法的优势在于,该算法在软件环境下能够维持与AC算法相同的匹配性能,而且能够将特征加载规模至少提升至上万以适应超大规模特征集匹配环境.
串匹配技術是入侵檢測繫統中的關鍵技術,隨著特徵數量的增加,現有的自動機類匹配算法都會麵對內存佔用過大的問題.噹特徵超過一定數目後,自動機可能根本無法構造.文中提齣瞭一種針對超大規模特徵匹配(SLSPM)環境的匹配算法SLSPM.SLSPM算法藉助一箇塊式匹配自動機和若榦箇普通自動機完成匹配工作,而且能夠支持至少上萬規模的特徵集.與普通匹配自動機先讀入狀態再判斷讀入符號的方式不同,SLSPM首先使用散列函數判斷噹前文本塊是否可以被過濾掉.如果文本塊無法被過濾且為閤法文本塊時,再檢查噹前狀態是否是一箇能夠識彆噹前文本塊的狀態.僅在噹前狀態吻閤的情況下再讀入下一箇文本塊進行後續匹配.理論證明顯示SLSPM算法具有近似O(n)的複雜度.由于SLSPM算法未能保存全部的跳轉信息,其匹配速度相對于高級Aho-Corasick算法未有大幅提升.算法的優勢在于,該算法在軟件環境下能夠維持與AC算法相同的匹配性能,而且能夠將特徵加載規模至少提升至上萬以適應超大規模特徵集匹配環境.
천필배기술시입침검측계통중적관건기술,수착특정수량적증가,현유적자동궤류필배산법도회면대내존점용과대적문제.당특정초과일정수목후,자동궤가능근본무법구조.문중제출료일충침대초대규모특정필배(SLSPM)배경적필배산법SLSPM.SLSPM산법차조일개괴식필배자동궤화약간개보통자동궤완성필배공작,이차능구지지지소상만규모적특정집.여보통필배자동궤선독입상태재판단독입부호적방식불동,SLSPM수선사용산렬함수판단당전문본괴시부가이피과려도.여과문본괴무법피과려차위합법문본괴시,재검사당전상태시부시일개능구식별당전문본괴적상태.부재당전상태문합적정황하재독입하일개문본괴진행후속필배.이론증명현시SLSPM산법구유근사O(n)적복잡도.유우SLSPM산법미능보존전부적도전신식,기필배속도상대우고급Aho-Corasick산법미유대폭제승.산법적우세재우,해산법재연건배경하능구유지여AC산법상동적필배성능,이차능구장특정가재규모지소제승지상만이괄응초대규모특정집필배배경.
The current string matching algorithms nearly can not afford the burden of large memorydemand when the patters amount increases dramatically.Matching automaton can not be estab-lished at all when the amount of patterns is at least tens of thousands.We present a solution tothe problem of super large scale patterns matching (SLSPM).In our design,a matching trie isdivided into one block matching trie and many general character matching tries if possible.Duringa block matching procedure our block matching automaton (trie)does not read the current statefirst.Instead,the automaton first reads the current text block symbol and decides whether it willbe matched or not by a hash function.Then,the automaton looks for the current state in thestates set in which all the states recognize the same current text block symbol.After the currentstate is found the automaton continues to read the next text block symbol.The theoretical analysisshows that under the worst case the proposed algorithm takes O(n)time approximately,where n is the length of the text.The experiment results show that our design matches only a little fasterthan the advanced Aho-Corasick because in the advanced Aho-Corasick the entire possible transitioninformation has been stored.The advantage of SLSPMis that under software environmentSLSPMis not slower than AC during the matching procedure,and also at least tens of thousands patterns can be loaded into the hybrid automatons of SLSPMso that it can be used well for superlarge scale patters matching environment.