CAJ | 학술논문

随着越来越多的医院开展数字化建设以及区域医疗应用范围的扩大，大量非结构化、半结构化医疗数据爆发式的增长，传统的技术架构在处理海量数据方面显得越来越乏力。深圳市区域卫生信息化数据交换平台，覆盖了全市60家公立医院、600多家社区卫生机构。平台接入近50个异构系统，现有1700多万份健康档案、30亿条以上诊疗数据，平均每天产生500万以上的小文件。针对深圳市卫生区域信息化建设，海量小文件交换处理效率低下的问题，利用Hadoop平台，提出了采用时间基线归档文件技术和序列文件技术解决小文件存储、检索效率问题的解决方案，经验证实该技术可满足实际业务应用中对数据交换的需要。详细描述了该技术的实现细节，包括根据业务数据规模划定时间基线，根据业务需求定制数据类型、数据结构，将小文件合并分块存储，建立小文件到大文件的映射以及相关数据交换处理流程等，并基于真实数据对该技术进行了评测比较，结果表明上述技术与常规技术相比明显提升了批量处理小文件的效率。
수착월래월다적의원개전수자화건설이급구역의료응용범위적확대，대량비결구화、반결구화의료수거폭발식적증장，전통적기술가구재처리해량수거방면현득월래월핍력。심수시구역위생신식화수거교환평태，복개료전시60가공립의원、600다가사구위생궤구。평태접입근50개이구계통，현유1700다만빈건강당안、30억조이상진료수거，평균매천산생500만이상적소문건。침대심수시위생구역신식화건설，해량소문건교환처리효솔저하적문제，이용Hadoop평태，제출료채용시간기선귀당문건기술화서렬문건기술해결소문건존저、검색효솔문제적해결방안，경험증실해기술가만족실제업무응용중대수거교환적수요。상세묘술료해기술적실현세절，포괄근거업무수거규모화정시간기선，근거업무수구정제수거류형、수거결구，장소문건합병분괴존저，건립소문건도대문건적영사이급상관수거교환처리류정등，병기우진실수거대해기술진행료평측비교，결과표명상술기술여상규기술상비명현제승료비량처리소문건적효솔。
As more and more hospitals being digitized and the scope of regional medical applications being expanded, large amounts of unstructured or semi-structured medical data have seen explosive growth, and the traditional technical architecture for handling massive amounts of data has become increasingly weak. At present, the Shenzhen regional health information data exchange platform covers 60 public hospitals and more than 600 community health agencies in Shenzhen. The platform which is accessing nearly 50 heterogeneous systems presently having more than 16 million copies of existing health records and over 3 billion clinic data, generates an average of more than 5 million small files every day. According to the Shenzhen regional health informatization construction and aiming to solve massive small files exchange process inefficiencies, this paper proposed using the archive technologies and techniques based on the time baseline to solve the problems of small files' storage and retrieval based on the Hadoop platform. The technology can meet the needs of practical business application for data exchange. This paper described the implementation details of the technology, including the delineation of the time scale based on business data at baseline, customised data types and data structures according to the business needs, small files' merge and block storage, the establishment of mapping from small files to large files and related data exchange processing, etc. The technical evaluations based on real data were compared, and the results showed that these techniques significantly improved the processing efficiency of massive small files compared with the conventional techniques.