计算机技术与发展
計算機技術與髮展
계산궤기술여발전
COMPUTER TECHNOLOGY AND DEVELOPMENT
2015年
1期
137-142
,共6页
海量数据%云计算%粗糙集%不完备信息系统%约简%MapReduce
海量數據%雲計算%粗糙集%不完備信息繫統%約簡%MapReduce
해량수거%운계산%조조집%불완비신식계통%약간%MapReduce
massive data%cloud computing%rough set%incomplete information system%reduction%MapReduce
面向大规模的数据进行知识约简是近年来粗糙集理论研究的热点。传统不完备信息系统的知识约简是假设在初始时将所有需要处理的数据一次性地装入内存中,这明显不适合处理海量数据,更不适合处理含有缺失信息的海量数据。为此,深入剖析了带有缺失信息的数据特征,把缺失属性的值用该属性所有可能的取值表示,并结合知识约简算法中的可并行性,从属性(集)的可辨识性和不可辨识性出发,并在MapReduce框架下设计了可用来处理不完备信息系统的知识约简算法。实验结果表明,该算法是有效可行的,能够对不完备信息系统中的海量数据进行知识约简。
麵嚮大規模的數據進行知識約簡是近年來粗糙集理論研究的熱點。傳統不完備信息繫統的知識約簡是假設在初始時將所有需要處理的數據一次性地裝入內存中,這明顯不適閤處理海量數據,更不適閤處理含有缺失信息的海量數據。為此,深入剖析瞭帶有缺失信息的數據特徵,把缺失屬性的值用該屬性所有可能的取值錶示,併結閤知識約簡算法中的可併行性,從屬性(集)的可辨識性和不可辨識性齣髮,併在MapReduce框架下設計瞭可用來處理不完備信息繫統的知識約簡算法。實驗結果錶明,該算法是有效可行的,能夠對不完備信息繫統中的海量數據進行知識約簡。
면향대규모적수거진행지식약간시근년래조조집이론연구적열점。전통불완비신식계통적지식약간시가설재초시시장소유수요처리적수거일차성지장입내존중,저명현불괄합처리해량수거,경불괄합처리함유결실신식적해량수거。위차,심입부석료대유결실신식적수거특정,파결실속성적치용해속성소유가능적취치표시,병결합지식약간산법중적가병행성,종속성(집)적가변식성화불가변식성출발,병재MapReduce광가하설계료가용래처리불완비신식계통적지식약간산법。실험결과표명,해산법시유효가행적,능구대불완비신식계통중적해량수거진행지식약간。
Knowledge reduction for massive datasets has attracted many research interests in rough set theory. Traditional knowledge re-duction algorithms of incomplete information system assume that all the datasets can be loaded into the main memory,which are obvious-ly infeasible for large-scale datasets,especially for massive datasets with missing information. To this end,deeply analyze the characteris-tics of massive datasets with missing information,and allow the missing attribute value to take all possible values. Then,by combining the parallel computations used in classical knowledge reduction algorithms with the discernibility ( indiscernibility) of the attributes,a knowl-edge reduction algorithm is designed for incomplete information systems under MapReduce framework. The experimental results demon-strate that this algorithm is effective and feasible,which can efficiently process massive datasets for knowledge reduction in incomplete in-formation systems.