计算机工程与应用
計算機工程與應用
계산궤공정여응용
COMPUTER ENGINEERING AND APPLICATIONS
2015年
18期
126-130,185
,共6页
刘峰斌%袁志勇%肖玲%王惠玲%王高华
劉峰斌%袁誌勇%肖玲%王惠玲%王高華
류봉빈%원지용%초령%왕혜령%왕고화
数据挖掘%关联规则%关联规则压缩%频繁项集%焦虑%抑郁
數據挖掘%關聯規則%關聯規則壓縮%頻繁項集%焦慮%抑鬱
수거알굴%관련규칙%관련규칙압축%빈번항집%초필%억욱
data mining%association rules%association rule summarization%frequent itemsets%anxiety%depression
针对焦虑抑郁患者的早期预防和诊断需求,将关联规则挖掘和压缩方法应用于焦虑抑郁障碍因素的研究,在病人数据中挖掘出与焦虑抑郁障碍相关性较高的因素集合。单独使用频繁项集挖掘算法会产生过多的频繁项集和关联规则,导致其实用性大为降低。对收集的病人数据进行预处理,采用FP-growth算法,挖掘出预处理后数据中的频繁项集,采用最新改进Bottom-Up Summarization(BUS)算法,对挖掘出的频繁项集进行压缩。同时将最后得到的关联规则与未压缩得到的关联规则、原始BUS算法及Top-K算法压缩后得到的关联规则进行对比。实验结果表明,使用改进BUS算法得到的规则数量适中、信息冗余较少而且覆盖的人群具有更高的患病风险。
針對焦慮抑鬱患者的早期預防和診斷需求,將關聯規則挖掘和壓縮方法應用于焦慮抑鬱障礙因素的研究,在病人數據中挖掘齣與焦慮抑鬱障礙相關性較高的因素集閤。單獨使用頻繁項集挖掘算法會產生過多的頻繁項集和關聯規則,導緻其實用性大為降低。對收集的病人數據進行預處理,採用FP-growth算法,挖掘齣預處理後數據中的頻繁項集,採用最新改進Bottom-Up Summarization(BUS)算法,對挖掘齣的頻繁項集進行壓縮。同時將最後得到的關聯規則與未壓縮得到的關聯規則、原始BUS算法及Top-K算法壓縮後得到的關聯規則進行對比。實驗結果錶明,使用改進BUS算法得到的規則數量適中、信息冗餘較少而且覆蓋的人群具有更高的患病風險。
침대초필억욱환자적조기예방화진단수구,장관련규칙알굴화압축방법응용우초필억욱장애인소적연구,재병인수거중알굴출여초필억욱장애상관성교고적인소집합。단독사용빈번항집알굴산법회산생과다적빈번항집화관련규칙,도치기실용성대위강저。대수집적병인수거진행예처리,채용FP-growth산법,알굴출예처리후수거중적빈번항집,채용최신개진Bottom-Up Summarization(BUS)산법,대알굴출적빈번항집진행압축。동시장최후득도적관련규칙여미압축득도적관련규칙、원시BUS산법급Top-K산법압축후득도적관련규칙진행대비。실험결과표명,사용개진BUS산법득도적규칙수량괄중、신식용여교소이차복개적인군구유경고적환병풍험。
For early prevention and diagnosis of patients with anxiety and depression, this paper applies association rule mining and summarization methods to medical records to discover sets of risk factors associated with anxiety and depression. Separate use of frequent itemsets mining algorithm would produce too many frequent itemsets and association rules, causing its practicability greatly reduced. It preprocesses the medical records. Then it uses the FP-growth algorithm to find frequent itemsets in the data after pretreatment. At last, it uses the latest improvement Bottom-Up Summarization(BUS)algorithm to summarize the discovered frequent itemsets. At the same time, it compares the association rules obtained at last with the association rules uncompressed and the association rules obtained by the original BUS algorithm and Top-K. Experi-mental results show that the rules obtained by improved BUS algorithm have moderate number, less redundant information and the people covered by these rules are at high risk of anxiety or depression.