软件学报
軟件學報
연건학보
JOURNAL OF SOFTWARE
2014年
9期
2119-2135
,共17页
徐菲菲%雷景生%毕忠勤%苗夺谦%杜海舟
徐菲菲%雷景生%畢忠勤%苗奪謙%杜海舟
서비비%뢰경생%필충근%묘탈겸%두해주
大数据%区间值%近似约简%多决策表%全局约简
大數據%區間值%近似約簡%多決策錶%全跼約簡
대수거%구간치%근사약간%다결책표%전국약간
big data%interval-value%approximate reduction%multi-decision tables%global reduction
在电力大数据中,很多具体的应用如负荷预测、故障诊断都需要依据一段时间内的数据变化来判断所属类别,对某一条数据进行类别判定是毫无意义的。基于此,将区间值粗糙集引入到大数据分类问题中,分别从代数观和信息观提出了基于属性依赖度和基于互信息的区间值启发式约简相关定义和性质证明,并给出相应算法,丰富和发展了区间值粗糙集理论,同时为大数据的分析研究提供了思路。针对大数据的分布式存储架构,又提出了多决策表的区间值全局约简概念和性质证明,进一步给出多决策表的区间值全局约简算法。为了使得算法在实际应用中取得更好的效果,将近似约简概念引入所提的3种算法中,通过对2012上半年某电厂一台600MW的机组运行数据进行稳态判定,验证所提算法的有效性。实验结果表明,所提的3种算法均能在保持较高分类准确率的条件下从对象和属性个数两方面对数据集进行大幅度缩减,从而为大数据的进一步分析处理提供支撑。
在電力大數據中,很多具體的應用如負荷預測、故障診斷都需要依據一段時間內的數據變化來判斷所屬類彆,對某一條數據進行類彆判定是毫無意義的。基于此,將區間值粗糙集引入到大數據分類問題中,分彆從代數觀和信息觀提齣瞭基于屬性依賴度和基于互信息的區間值啟髮式約簡相關定義和性質證明,併給齣相應算法,豐富和髮展瞭區間值粗糙集理論,同時為大數據的分析研究提供瞭思路。針對大數據的分佈式存儲架構,又提齣瞭多決策錶的區間值全跼約簡概唸和性質證明,進一步給齣多決策錶的區間值全跼約簡算法。為瞭使得算法在實際應用中取得更好的效果,將近似約簡概唸引入所提的3種算法中,通過對2012上半年某電廠一檯600MW的機組運行數據進行穩態判定,驗證所提算法的有效性。實驗結果錶明,所提的3種算法均能在保持較高分類準確率的條件下從對象和屬性箇數兩方麵對數據集進行大幅度縮減,從而為大數據的進一步分析處理提供支撐。
재전력대수거중,흔다구체적응용여부하예측、고장진단도수요의거일단시간내적수거변화래판단소속유별,대모일조수거진행유별판정시호무의의적。기우차,장구간치조조집인입도대수거분류문제중,분별종대수관화신식관제출료기우속성의뢰도화기우호신식적구간치계발식약간상관정의화성질증명,병급출상응산법,봉부화발전료구간치조조집이론,동시위대수거적분석연구제공료사로。침대대수거적분포식존저가구,우제출료다결책표적구간치전국약간개념화성질증명,진일보급출다결책표적구간치전국약간산법。위료사득산법재실제응용중취득경호적효과,장근사약간개념인입소제적3충산법중,통과대2012상반년모전엄일태600MW적궤조운행수거진행은태판정,험증소제산법적유효성。실험결과표명,소제적3충산법균능재보지교고분류준학솔적조건하종대상화속성개수량방면대수거집진행대폭도축감,종이위대수거적진일보분석처리제공지탱。
For the big data on electric power, many specific applications, such as load forecasting and fault diagnosis, need to consider data changes during a period of time to determine their decision classes, as deriving a class label of only one data record is meaningless. Based on the above discussion, interval-valued rough set is introduced into big data classification. Employing algebra and information theory, this paper defines the related concepts and proves the properties for interval-valued reductions based on dependency and mutual information, and presents the corresponding heuristic reduction algorithms. The proposed methods can not only enrich and develop the interval-valued rough set theory, but also provide a new way for the analysis of big data. Pertaining to the distributed data storage architecture of big data, this paper further proposes the interval-valued global reduction in multi-decision tables with proofs of its properties. The corresponding algorithm is also given. In order for the algorithms to achieve better results in practical applications, approximate reduction is introduced. To evaluate three proposed algorithms, it uses six months’ operating data of one 600MW unit in some power plant. Experimental results show that the three algorithms proposed in this article can maintain high classification accuracy with the proper parameters, and the numbers of objects and attributes can both be greatly reduced.