电子与信息学报
電子與信息學報
전자여신식학보
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY
2013年
4期
958-964
,共7页
张震*%汪斌强%陈鸿昶%马海龙
張震*%汪斌彊%陳鴻昶%馬海龍
장진*%왕빈강%진홍창%마해룡
流量分类%用户连接图%用户相似度%图挖掘
流量分類%用戶連接圖%用戶相似度%圖挖掘
류량분류%용호련접도%용호상사도%도알굴
Traffic classification%Host Connection Graph (HCG)%User similarity%Graph mining
针对机器学习分类算法的“概念漂移”现象,该文提出了一种基于用户连接图的(Host Connection Graph, HCG)流量分类机制.算法将{IP Address, Port}作为用户唯一标识,构建了用户连接图,提出了“用户相似度”的概念;应用“图挖掘”理论将用户连接图划分为互不相交的行为子簇,使得用户之间的相互通信抽象为一种“社会团体”;通过定义基于信息熵的“用户行为模式”(UBM),分析了各个行为子簇背后表现出的业务特征,并使用“UBM+Port”对用户行为子簇进行了业务标签映射,实现了流量分类的目的.仿真实验表明:在不牺牲识别准确率的前提下,算法不仅能克服“概念漂移”问题,还能有效降低算法的计算复杂度.
針對機器學習分類算法的“概唸漂移”現象,該文提齣瞭一種基于用戶連接圖的(Host Connection Graph, HCG)流量分類機製.算法將{IP Address, Port}作為用戶唯一標識,構建瞭用戶連接圖,提齣瞭“用戶相似度”的概唸;應用“圖挖掘”理論將用戶連接圖劃分為互不相交的行為子簇,使得用戶之間的相互通信抽象為一種“社會糰體”;通過定義基于信息熵的“用戶行為模式”(UBM),分析瞭各箇行為子簇揹後錶現齣的業務特徵,併使用“UBM+Port”對用戶行為子簇進行瞭業務標籤映射,實現瞭流量分類的目的.倣真實驗錶明:在不犧牲識彆準確率的前提下,算法不僅能剋服“概唸漂移”問題,還能有效降低算法的計算複雜度.
침대궤기학습분류산법적“개념표이”현상,해문제출료일충기우용호련접도적(Host Connection Graph, HCG)류량분류궤제.산법장{IP Address, Port}작위용호유일표식,구건료용호련접도,제출료“용호상사도”적개념;응용“도알굴”이론장용호련접도화분위호불상교적행위자족,사득용호지간적상호통신추상위일충“사회단체”;통과정의기우신식적적“용호행위모식”(UBM),분석료각개행위자족배후표현출적업무특정,병사용“UBM+Port”대용호행위자족진행료업무표첨영사,실현료류량분류적목적.방진실험표명:재불희생식별준학솔적전제하,산법불부능극복“개념표이”문제,환능유효강저산법적계산복잡도.
@@@@Considering at the concept drift issue of machine learning identification, a novel algorithm called traffic classification based on Host Connection Graph (HCG) is proposed. Considering{IP Address, Port}as the unique user identifier, HCG constructs a host connection graph and innovates the concept of user similarity. Based on the theory of graph mining, social community is abstracted from communications among hosts by partitioning the graph into mutually intersectant behavior clusters. In order to reach traffic classification, HCG not only conceives a definition called User Behavior Mode (UBM) to analyse the implicit traffic characteristics, but also maps application labels to every host behavior by employing UBM and Port. Finally, simulations are conducted based on the real network trace. Results demonstrate that HCG can circumvent the concept shift problem and ameliorate gracefully computational complication without sacrificing accuracy.