计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2014年
11期
1381-1390
,共10页
张远健%徐健锋%涂敏%黄学坚%刘清
張遠健%徐健鋒%塗敏%黃學堅%劉清
장원건%서건봉%도민%황학견%류청
ICU%不确定时间序列%预测%机器学习%混合框架
ICU%不確定時間序列%預測%機器學習%混閤框架
ICU%불학정시간서렬%예측%궤기학습%혼합광가
ICU%uncertain time series%prediction%machine learning%hybrid framework
ICU病人生死预测一直都是医学界的研究热点和难点。数据挖掘的机器学习方法近年来在该领域取得了一定的进展,但依然有很大的发展空间。针对ICU时序数据的高维度和不确定间隔采样特性,提出了不确定间隔采样转化为确定间隔的空采样的思想和相应的处理策略;在此基础上将传统的时间序列聚类与机器学习方法相结合,提出了一个两阶段的混合多机器学习算法框架,使得数据集的高维和不确定性得到了约简,从而可以采用经典的机器学习方法挖掘病人生死知识。在一个公开数据集上的两组实验结果表明,基于该算法框架的ICU病人死亡预测方法对于少数样本的分类效果优于传统方法,弹性时间间隔下的预测效果更好,最优时间间隔的选取可以通过实验效果来验证。
ICU病人生死預測一直都是醫學界的研究熱點和難點。數據挖掘的機器學習方法近年來在該領域取得瞭一定的進展,但依然有很大的髮展空間。針對ICU時序數據的高維度和不確定間隔採樣特性,提齣瞭不確定間隔採樣轉化為確定間隔的空採樣的思想和相應的處理策略;在此基礎上將傳統的時間序列聚類與機器學習方法相結閤,提齣瞭一箇兩階段的混閤多機器學習算法框架,使得數據集的高維和不確定性得到瞭約簡,從而可以採用經典的機器學習方法挖掘病人生死知識。在一箇公開數據集上的兩組實驗結果錶明,基于該算法框架的ICU病人死亡預測方法對于少數樣本的分類效果優于傳統方法,彈性時間間隔下的預測效果更好,最優時間間隔的選取可以通過實驗效果來驗證。
ICU병인생사예측일직도시의학계적연구열점화난점。수거알굴적궤기학습방법근년래재해영역취득료일정적진전,단의연유흔대적발전공간。침대ICU시서수거적고유도화불학정간격채양특성,제출료불학정간격채양전화위학정간격적공채양적사상화상응적처리책략;재차기출상장전통적시간서렬취류여궤기학습방법상결합,제출료일개량계단적혼합다궤기학습산법광가,사득수거집적고유화불학정성득도료약간,종이가이채용경전적궤기학습방법알굴병인생사지식。재일개공개수거집상적량조실험결과표명,기우해산법광가적ICU병인사망예측방법대우소수양본적분류효과우우전통방법,탄성시간간격하적예측효과경호,최우시간간격적선취가이통과실험효과래험증。
The mortality prediction of ICU patient has been an active topic in the past decades. Machine learning algo-rithms have been proved to have preliminary effects in this domain and still have room for improvement. In order to deal with the ICU time series which is both high dimensional and uncertain sampling interval, this paper proposes the idea that the unequal sampling frequency phenomenon in time series can be transferred to the empty value under the regular sampling frequency and corresponding strategies. Then, this paper proposes a two-step hybrid framework which combines the time series clustering and machine learning algorithm. In the first step, the dimension and uncertainty are reduced;in the second step, classical machine learning algorithms are conducted for mortality prediction of ICU patient. The experiments on a public data set show that the results of classifying the minority death patients are more efficient than the traditional solutions and the elastic interval is better. The selection for best time interval is validated by the experiments meanwhile.