管理工程学报
管理工程學報
관리공정학보
Journal of Industrial Engineering and Engineering Management
2012年
2期
85~93
,共null页
马田系统 分类 不平衡数据 概率阈值模型 全方位优化算法
馬田繫統 分類 不平衡數據 概率閾值模型 全方位優化算法
마전계통 분류 불평형수거 개솔역치모형 전방위우화산법
hedge ratio; prospect theory; dynamic risk aversion; hedge strategy
在分类问题中,类别不平衡问题将引起分类器训练偏差,导致少数类样本诊断敏感性降低。马田系统是一种多元数据诊断和预测技术,它通过构建一个连续的测量尺度而非直接对训练样本进行学习,该性质有望不受数据分布的影响,克服分类不平衡问题。本文针对马田系统阈值计算缺陷和不平衡数据分类要求,研究一种概率阈值模型计算马田系统阈值;还针对马田系统的若干不足,采用优化模型替代正交表和信噪比筛选关键变量,并使用了一种全方位优化算法求解。通过对8个UCI数据集的实验分析表明,改进的马田系统不仅对不平衡数据有较好的分类效果,且能筛选关键变量,降维效果明显。
在分類問題中,類彆不平衡問題將引起分類器訓練偏差,導緻少數類樣本診斷敏感性降低。馬田繫統是一種多元數據診斷和預測技術,它通過構建一箇連續的測量呎度而非直接對訓練樣本進行學習,該性質有望不受數據分佈的影響,剋服分類不平衡問題。本文針對馬田繫統閾值計算缺陷和不平衡數據分類要求,研究一種概率閾值模型計算馬田繫統閾值;還針對馬田繫統的若榦不足,採用優化模型替代正交錶和信譟比篩選關鍵變量,併使用瞭一種全方位優化算法求解。通過對8箇UCI數據集的實驗分析錶明,改進的馬田繫統不僅對不平衡數據有較好的分類效果,且能篩選關鍵變量,降維效果明顯。
재분류문제중,유별불평형문제장인기분류기훈련편차,도치소수류양본진단민감성강저。마전계통시일충다원수거진단화예측기술,타통과구건일개련속적측량척도이비직접대훈련양본진행학습,해성질유망불수수거분포적영향,극복분류불평형문제。본문침대마전계통역치계산결함화불평형수거분류요구,연구일충개솔역치모형계산마전계통역치;환침대마전계통적약간불족,채용우화모형체대정교표화신조비사선관건변량,병사용료일충전방위우화산법구해。통과대8개UCI수거집적실험분석표명,개진적마전계통불부대불평형수거유교호적분류효과,차능사선관건변량,강유효과명현。
The classification of imbalanced data is that one class may be represented by a large number of examples, and the other class, usually the more important class, is represented by only a few in the binary classification problem. Traditional classification techniques always assume that the training examples are evenly distributed among different classes, which will cause bias. The classifier has the tendency of poorly predicting the minority class. Several researchers have studied the data and algorithm levels to cope with the class imbalance problem. However, the methods at the data level can potentially remove certain important information or introduce noise and the methods at algorithm level. Since the method lacks the systematic foundation, it may end up with rules overfitting the training data. The Mahalanobis-Taguchi System (MTS) is a collection of methods proposed for a diagnostic and forecasting technique using multivariate data. MTS combines Mahalanobis distance (MD) and Taguchi's robust engineering. MD is used to construct a multidimensional measurement scale, whereas Taguchi's robust engineering is applied to determine important variables and optimize the system. MTS establishes a classification model by constructing a continuous measurement scale using single class samples rather than directly learning from the whole training data set. This property seems useful in solving the class imbalance problems. This study is carried out in order to investigate whether or not MTS has better classification ability than other classification techniques when facing class imbalance problems. This paper develops a probabilistic threshold model (PTM) to determine the classification threshold of MTS. Aiming at the inadequacy of MTS, the authors propose an improved MTS optimization model. The core idea is that a number of optimization objectives are proposed based on the purpose the classification problem, and optimization model is used for screening important variables instead of orthogonal arrays and signal-noise-ratio.