CAJ | 학술논문

基于平面波的第一原理计算方法是目前材料科学中最常用的方法，但传统的CPU并行计算遇到可扩展性瓶颈，无法改善其求解的绝对速度。系统地介绍了利用图形处理器（graphic processing unit，GPU）加速技术开发的大规模第一原理材料计算软件：Ultra-Mat。该软件对第一原理平面波算法进行了系统的算法设计和软件实现：（1）通过采用并行方案，实现了快速傅里叶变换（fast Fourier transform，FFT）的GPU局部操作；（2）设计了基于数据压缩的混合精度算法，显著减少了电子结构计算部分的MPI（message passing interface）通信；（3）完成了逾90%代码的GPU实现，目的是最大限度地减少中间流程，以避免CPU-GPU切换引发的数据传输，这是GPU应用中公认的性能瓶颈。测试结果显示Ultra-Mat具有很好的计算性能，对于512原子的GaAs系统，在电子结构计算部分，使用256 GPU卡相比4096 CPU核心有18倍的加速。
기우평면파적제일원리계산방법시목전재료과학중최상용적방법，단전통적CPU병행계산우도가확전성병경，무법개선기구해적절대속도。계통지개소료이용도형처리기（graphic processing unit，GPU）가속기술개발적대규모제일원리재료계산연건：Ultra-Mat。해연건대제일원리평면파산법진행료계통적산법설계화연건실현：（1）통과채용병행방안，실현료쾌속부리협변환（fast Fourier transform，FFT）적GPU국부조작；（2）설계료기우수거압축적혼합정도산법，현저감소료전자결구계산부분적MPI（message passing interface）통신；（3）완성료유90%대마적GPU실현，목적시최대한도지감소중간류정，이피면CPU-GPU절환인발적수거전수，저시GPU응용중공인적성능병경。측시결과현시Ultra-Mat구유흔호적계산성능，대우512원자적GaAs계통，재전자결구계산부분，사용256 GPU잡상비4096 CPU핵심유18배적가속。
First principle calculation based on plane wave is the most popular method in material science simulation. However, traditional CPU parallelization has encountered the scalability bottleneck. Thus the absolute computing time cannot be reduced by using more CPU cores. This paper presents a first principle calculation software on large scale GPU (graphic processing unit) cluster:Ultra-Mat. It also redesigns and implements the algorithm:(1) Utilize a hybrid parallelization scheme to do FFT (fast Fourier transform) in single GPU card. (2) Design and implement a mix preci-sion algorithm to avoid CPU-GPU memory copy and MPI (message passing interface) communication. (3) Imple-ment more than 90%of the codes using CUDA. This step reduces the CPU-GPU memory copy operation, which is an accepted bottleneck in the heterogonous supercomputer. For a 512 atom GaAs system, the testing results show that, the method of using 256 GPU cards has 18 times speedup in the electronic structure calculation compared with 4096 CPU cores.