重庆工商大学学报:自然科学版
重慶工商大學學報:自然科學版
중경공상대학학보:자연과학판
Journal of Chongqing Technology and Business University:Natural Science Edition
2012年
8期
36-41
,共6页
贝叶斯网络分类%朴素贝叶斯网络%K-均值聚类%数据挖掘
貝葉斯網絡分類%樸素貝葉斯網絡%K-均值聚類%數據挖掘
패협사망락분류%박소패협사망락%K-균치취류%수거알굴
Bayesian network classification%Naive Bayesian network%K-means clustering%data mining
针对朴素贝叶斯网络分类模型在处理高维大数据量时的效率偏低和准确率有待提高的问题,结合主元分析法与K-均值聚类算法构造出了一个改进的朴素贝叶斯网络分类模型;摒弃了非类属性变量相对于类属性变量相对独立的前提条件,算法首先用主元分析法在对数据集的信息量尽量保存的同时进行了降维操作,使得算法可以着重于进行分类问题;算法还提出了一个"相对融合点"的概念,有效地提高了算法的性能;最后对算法的性能进行了分析,并将改进的算法应用到实际的数据集进行实验,用算法产生的分类结果对数据集中产生的一些缺失数据进行修补。
針對樸素貝葉斯網絡分類模型在處理高維大數據量時的效率偏低和準確率有待提高的問題,結閤主元分析法與K-均值聚類算法構造齣瞭一箇改進的樸素貝葉斯網絡分類模型;摒棄瞭非類屬性變量相對于類屬性變量相對獨立的前提條件,算法首先用主元分析法在對數據集的信息量儘量保存的同時進行瞭降維操作,使得算法可以著重于進行分類問題;算法還提齣瞭一箇"相對融閤點"的概唸,有效地提高瞭算法的性能;最後對算法的性能進行瞭分析,併將改進的算法應用到實際的數據集進行實驗,用算法產生的分類結果對數據集中產生的一些缺失數據進行脩補。
침대박소패협사망락분류모형재처리고유대수거량시적효솔편저화준학솔유대제고적문제,결합주원분석법여K-균치취류산법구조출료일개개진적박소패협사망락분류모형;병기료비류속성변량상대우류속성변량상대독립적전제조건,산법수선용주원분석법재대수거집적신식량진량보존적동시진행료강유조작,사득산법가이착중우진행분류문제;산법환제출료일개"상대융합점"적개념,유효지제고료산법적성능;최후대산법적성능진행료분석,병장개진적산법응용도실제적수거집진행실험,용산법산생적분류결과대수거집중산생적일사결실수거진행수보。
According to the low efficiency and low accuracy of the naive Bayesian network classification model in dealing with large number of high-dimensional data, by combining Principal Component Analysis and K-means clustering algorithm, this paper gives an improved Navve Bayesian network classification model. The model abandoned the premise for the relative independence between non-class attribute variables and class attribute variables. Firstly, we use principal component analysis to reduce the dimensionality of the data set, so the algorithm can focus on the classification problem. The algorithm has also proposed a concept called "relative fusion point" to effectively improve the performance of the algorithm. Finally, the performance of the algorithm is analyzed, and the improved algorithm is applied to the actual data set for experiment to repair the missing data of the data set, the results show that the algorithm is effective.