CAJ | 학술논문

集成分类器已被广泛应用于数据流分类模型以此削弱概念漂移的影响.通常当基分类器的分类准确率低于特定的阈值时,集成分类器开始学习代替分类准确率低的分类器,以此来克服概念漂移的影响.但仅当基分类器的错误率低于阈值时才开始学习会使集成分类器对当前概念的判断产生一定滞后性,所以本文在集成分类器的基础上,融入了情景特征的分析,采用信息增益的方法提取情景特征,通过动态设置情景特征的阈值来提前预测概念漂移的发生.当情景特征的变化超出情景阈值时,立即通知集成分类器重新学习产生新的基分类器,而不是等到基分类器的准确率低于集成分类器的阈值时才开始学习,这样便使集成分类器具有了一定的前馈性.通过对特定数据的实验分析,证明了本文提出的OCEC （Origin Characteristics Ensemble Classifier）模型降低了挖掘概念漂移数据流时的集成泛化误差,提高了检测概念漂移的有效性.
집성분류기이피엄범응용우수거류분류모형이차삭약개념표이적영향.통상당기분류기적분류준학솔저우특정적역치시,집성분류기개시학습대체분류준학솔저적분류기,이차래극복개념표이적영향.단부당기분류기적착오솔저우역치시재개시학습회사집성분류기대당전개념적판단산생일정체후성,소이본문재집성분류기적기출상,융입료정경특정적분석,채용신식증익적방법제취정경특정,통과동태설치정경특정적역치래제전예측개념표이적발생.당정경특정적변화초출정경역치시,립즉통지집성분류기중신학습산생신적기분류기,이불시등도기분류기적준학솔저우집성분류기적역치시재개시학습,저양편사집성분류기구유료일정적전궤성.통과대특정수거적실험분석,증명료본문제출적OCEC （Origin Characteristics Ensemble Classifier）모형강저료알굴개념표이수거류시적집성범화오차,제고료검측개념표이적유효성.
Data mining techniques have been applied in many fundamental research domains such as retailing, stock market,telecommunications industry, and medicine. Data stream generated from the data in these industries are not stable and they change all the time. Moreover, these changes are unpredictable trigger dynamicity of target concepts which are generally known as concept drifts in the literature. Still, the relationships between hidden context and concepts are not clear. Modeling data flow which contains concept drift is one of core problems in the data mining field because the changeable target concept will reduce the accuracy of the model, and require that the corresponding decision model be revised to process the current inputted data. The models and algorithms used in the existing literature can be categorized into three groups：（ 1 ） instance-based selection learning method, （2） instance-based weighting learning method, and （ 3 ） ensemble classification learning method （ or learning with multiple concept descriptions）. The base classifiers are used to reflect the current concept, and predict different classes of samples by integrating all the classification results. Ensemble classifier has been widely used to weaken the impact of concept drift on data stream classification models. When the predictive accuracy of one base classifier in these models is below the given threshold, the ensemble classifier begins to learn a new base classifier and replaces the old one to overcome the influence from the concept drift. However, the ensemble classifier starts to learn only when the accuracy of the base classifier is lower than the threshold. As a result, this may cause a certain lag from the identification of the current concepts for the ensemble classifier. This paper proposes a new method which adds scenario characteristics analysis to the ensemble classifier and adopts the information gain method to extract scenario characteristics. In addition, the threshold of the scenario characteristic is set dynamically to predict the occurrence of concept drift. When the variation of scenario characteristics exceeds the scenario threshold, the ensemble classifier is stimulated immediately to create a new base classifier rather than wait until the accuracy of the base classifier is below the given threshold, which makes the ensemble classifier capable of feed-forward learning. In this work, the proposed OCEC （Origin Characteristics Ensemble Classifier） model should be validated by several computational experiments because OCEC can reduce the integrated generalization error for mining concept drift data streams, and improve the effectiveness of concept drift detection.