计算机应用与软件
計算機應用與軟件
계산궤응용여연건
COMPUTER APPLICATIONS AND SOFTWARE
2015年
4期
80-82,86
,共4页
聚类%文本聚类%模糊C均值%欧氏距离%马氏距离%自动阅卷
聚類%文本聚類%模糊C均值%歐氏距離%馬氏距離%自動閱捲
취류%문본취류%모호C균치%구씨거리%마씨거리%자동열권
Clustering%Text clustering%Fuzzy c-means (FCM)%Euclidean distance%Mahalanobis distance%Automatic paper marking
基于欧氏距离的传统模糊划分聚类算法较适用于球型结构的聚类。将其应用于维度较高的文本聚类时,准确率和效率均有所下降。为解决这一问题,提出一种基于马氏距离的文本聚类算法。该算法可发现非球形结构的类簇,在不需要先验知识的情况下,仅通过数学迭代即可得到聚类结果。鉴于当前无纸化考试系统的广泛应用,将该算法应用于主观题的自动阅卷系统中。通过对多种主观题的仿真实验,表明了该算法与C均值和FCM算法相比,不仅能获得较高的准确率,算法收敛速度也较快。
基于歐氏距離的傳統模糊劃分聚類算法較適用于毬型結構的聚類。將其應用于維度較高的文本聚類時,準確率和效率均有所下降。為解決這一問題,提齣一種基于馬氏距離的文本聚類算法。該算法可髮現非毬形結構的類簇,在不需要先驗知識的情況下,僅通過數學迭代即可得到聚類結果。鑒于噹前無紙化攷試繫統的廣汎應用,將該算法應用于主觀題的自動閱捲繫統中。通過對多種主觀題的倣真實驗,錶明瞭該算法與C均值和FCM算法相比,不僅能穫得較高的準確率,算法收斂速度也較快。
기우구씨거리적전통모호화분취류산법교괄용우구형결구적취류。장기응용우유도교고적문본취류시,준학솔화효솔균유소하강。위해결저일문제,제출일충기우마씨거리적문본취류산법。해산법가발현비구형결구적류족,재불수요선험지식적정황하,부통과수학질대즉가득도취류결과。감우당전무지화고시계통적엄범응용,장해산법응용우주관제적자동열권계통중。통과대다충주관제적방진실험,표명료해산법여C균치화FCM산법상비,불부능획득교고적준학솔,산법수렴속도야교쾌。
Traditional clustering algorithm with fuzzy partition based on Euclidean distance fits more the clustering of spherical structural clusters.When applying it to the text clustering with higher dimensions,the accuracy and efficiency will all be decreased.Focus on solving this problem,we propose a Mahalanobis distance-based text clustering algorithm.It can detect the class clusters with non-spherical structure, and can obtain the clustering result just through the mathematical iteration without the need of priori knowledge.In view of the wide applica-tion of paperless examination system at present,we apply this algorithm to automatic paper marking system of subjective questions.Through the simulation experiments on a variety of subjective questions,it is demonstrate that the algorithm can achieve higher accuracy rate than the c-means and FCM algorithms,furthermore,its convergence rate is higher as well.