中国数字医学
中國數字醫學
중국수자의학
CHINA DIGITAL MEDICINE
2014年
8期
89-92
,共4页
林德南%朱远燕%王浩%王爽%郑静
林德南%硃遠燕%王浩%王爽%鄭靜
림덕남%주원연%왕호%왕상%정정
医疗数据%时间基线%批量小文件%数据集成技术
醫療數據%時間基線%批量小文件%數據集成技術
의료수거%시간기선%비량소문건%수거집성기술
medical data%time baseline%massive small files%data integration technology
随着越来越多的医院开展数字化建设以及区域医疗应用范围的扩大,大量非结构化、半结构化医疗数据爆发式的增长,传统的技术架构在处理海量数据方面显得越来越乏力。深圳市区域卫生信息化数据交换平台,覆盖了全市60家公立医院、600多家社区卫生机构。平台接入近50个异构系统,现有1700多万份健康档案、30亿条以上诊疗数据,平均每天产生500万以上的小文件。针对深圳市卫生区域信息化建设,海量小文件交换处理效率低下的问题,利用Hadoop平台,提出了采用时间基线归档文件技术和序列文件技术解决小文件存储、检索效率问题的解决方案,经验证实该技术可满足实际业务应用中对数据交换的需要。详细描述了该技术的实现细节,包括根据业务数据规模划定时间基线,根据业务需求定制数据类型、数据结构,将小文件合并分块存储,建立小文件到大文件的映射以及相关数据交换处理流程等,并基于真实数据对该技术进行了评测比较,结果表明上述技术与常规技术相比明显提升了批量处理小文件的效率。
隨著越來越多的醫院開展數字化建設以及區域醫療應用範圍的擴大,大量非結構化、半結構化醫療數據爆髮式的增長,傳統的技術架構在處理海量數據方麵顯得越來越乏力。深圳市區域衛生信息化數據交換平檯,覆蓋瞭全市60傢公立醫院、600多傢社區衛生機構。平檯接入近50箇異構繫統,現有1700多萬份健康檔案、30億條以上診療數據,平均每天產生500萬以上的小文件。針對深圳市衛生區域信息化建設,海量小文件交換處理效率低下的問題,利用Hadoop平檯,提齣瞭採用時間基線歸檔文件技術和序列文件技術解決小文件存儲、檢索效率問題的解決方案,經驗證實該技術可滿足實際業務應用中對數據交換的需要。詳細描述瞭該技術的實現細節,包括根據業務數據規模劃定時間基線,根據業務需求定製數據類型、數據結構,將小文件閤併分塊存儲,建立小文件到大文件的映射以及相關數據交換處理流程等,併基于真實數據對該技術進行瞭評測比較,結果錶明上述技術與常規技術相比明顯提升瞭批量處理小文件的效率。
수착월래월다적의원개전수자화건설이급구역의료응용범위적확대,대량비결구화、반결구화의료수거폭발식적증장,전통적기술가구재처리해량수거방면현득월래월핍력。심수시구역위생신식화수거교환평태,복개료전시60가공립의원、600다가사구위생궤구。평태접입근50개이구계통,현유1700다만빈건강당안、30억조이상진료수거,평균매천산생500만이상적소문건。침대심수시위생구역신식화건설,해량소문건교환처리효솔저하적문제,이용Hadoop평태,제출료채용시간기선귀당문건기술화서렬문건기술해결소문건존저、검색효솔문제적해결방안,경험증실해기술가만족실제업무응용중대수거교환적수요。상세묘술료해기술적실현세절,포괄근거업무수거규모화정시간기선,근거업무수구정제수거류형、수거결구,장소문건합병분괴존저,건립소문건도대문건적영사이급상관수거교환처리류정등,병기우진실수거대해기술진행료평측비교,결과표명상술기술여상규기술상비명현제승료비량처리소문건적효솔。
As more and more hospitals being digitized and the scope of regional medical applications being expanded, large amounts of unstructured or semi-structured medical data have seen explosive growth, and the traditional technical architecture for handling massive amounts of data has become increasingly weak. At present, the Shenzhen regional health information data exchange platform covers 60 public hospitals and more than 600 community health agencies in Shenzhen. The platform which is accessing nearly 50 heterogeneous systems presently having more than 16 million copies of existing health records and over 3 billion clinic data, generates an average of more than 5 million small files every day. According to the Shenzhen regional health informatization construction and aiming to solve massive small files exchange process inefficiencies, this paper proposed using the archive technologies and techniques based on the time baseline to solve the problems of small files' storage and retrieval based on the Hadoop platform. The technology can meet the needs of practical business application for data exchange. This paper described the implementation details of the technology, including the delineation of the time scale based on business data at baseline, customised data types and data structures according to the business needs, small files' merge and block storage, the establishment of mapping from small files to large files and related data exchange processing, etc. The technical evaluations based on real data were compared, and the results showed that these techniques significantly improved the processing efficiency of massive small files compared with the conventional techniques.