计算机科学与探索
計算機科學與探索
계산궤과학여탐색
JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY
2013年
5期
472-479
,共8页
文敏华%林新华%Simon Chong Wee See
文敏華%林新華%Simon Chong Wee See
문민화%림신화%Simon Chong Wee See
统一计算架构(CUDA)%图形处理器(GPU)%直接模拟蒙特卡罗方法(DSMC)%动态网格DSMC%并行模拟
統一計算架構(CUDA)%圖形處理器(GPU)%直接模擬矇特卡囉方法(DSMC)%動態網格DSMC%併行模擬
통일계산가구(CUDA)%도형처리기(GPU)%직접모의몽특잡라방법(DSMC)%동태망격DSMC%병행모의
compute unified device architecture (CUDA)%graphic processing unit (GPU)%direct simulation Monte Carlo (DSMC)%dynamic collision grid DSMC%parallel simulation
直接模拟蒙特卡罗方法(direct simulation Monte Carlo,DSMC)是稀薄气体动力学领域的重要工具.然而,DSMC方法有两个比较主要的缺点:一是复杂的网格处理;另一个是庞大的计算量.使用动态网格的DSMC方法可以根据流场信息,动态生成自适应的碰撞网格,能有效解决前一个缺点;针对后一个缺点,使用统一计算架构(compute unified device architecture,CUDA)编写并行程序,将基于动态网格的DSMC方法移植到图形处理器(graphic processing unit,GPU)上以减少计算时间.在并行实现中,GPU负责绝大部分的计算,而CPU只负责初始化、结果输出等少量工作.使用一个二维超音速横掠平板问题作为算例,验证了并行程序的正确性.对于不同规模的算例,在NVIDIA Fermi C2050之上均获得了10倍以上的加速比;对于相同算例, NVIDIA最新发布的Kepler K20上的速度约为Fermi C2050上的1.3~1.6倍.
直接模擬矇特卡囉方法(direct simulation Monte Carlo,DSMC)是稀薄氣體動力學領域的重要工具.然而,DSMC方法有兩箇比較主要的缺點:一是複雜的網格處理;另一箇是龐大的計算量.使用動態網格的DSMC方法可以根據流場信息,動態生成自適應的踫撞網格,能有效解決前一箇缺點;針對後一箇缺點,使用統一計算架構(compute unified device architecture,CUDA)編寫併行程序,將基于動態網格的DSMC方法移植到圖形處理器(graphic processing unit,GPU)上以減少計算時間.在併行實現中,GPU負責絕大部分的計算,而CPU隻負責初始化、結果輸齣等少量工作.使用一箇二維超音速橫掠平闆問題作為算例,驗證瞭併行程序的正確性.對于不同規模的算例,在NVIDIA Fermi C2050之上均穫得瞭10倍以上的加速比;對于相同算例, NVIDIA最新髮佈的Kepler K20上的速度約為Fermi C2050上的1.3~1.6倍.
직접모의몽특잡라방법(direct simulation Monte Carlo,DSMC)시희박기체동역학영역적중요공구.연이,DSMC방법유량개비교주요적결점:일시복잡적망격처리;령일개시방대적계산량.사용동태망격적DSMC방법가이근거류장신식,동태생성자괄응적팽당망격,능유효해결전일개결점;침대후일개결점,사용통일계산가구(compute unified device architecture,CUDA)편사병행정서,장기우동태망격적DSMC방법이식도도형처리기(graphic processing unit,GPU)상이감소계산시간.재병행실현중,GPU부책절대부분적계산,이CPU지부책초시화、결과수출등소량공작.사용일개이유초음속횡략평판문제작위산례,험증료병행정서적정학성.대우불동규모적산례,재NVIDIA Fermi C2050지상균획득료10배이상적가속비;대우상동산례, NVIDIA최신발포적Kepler K20상적속도약위Fermi C2050상적1.3~1.6배.
@@@@The direct simulation Monte Carlo (DSMC) method is a powerful computational tool in the field of rarefied gas dynamics. However, there are two main shortages in DSMC method:one is complex gridding processing, the other is large time consumption. The dynamic collision grid DSMC method generates collision grids adaptively according to the flowfield, which overcomes the first shortage. For the other shortage, using compute unified device architecture (CUDA) to write parallel program, the dynamic collision grid DSMC method is ported to graphic pro-cessing unit (GPU) to reduce computing time. During the parallel implementation, the main computation is per-formed on GPU while CPU only deals with the processes of initialization and output. A two-dimensional benchmark problem in different sizes is used to demonstrate the correctness of the parallelization. The results show that 10 times speedup is achieved based on NVIDIA Fermi C2050. For a same case, the performance on NVIDIA newly released Kepler K20 is 1.3~1.6 times higher than that on Fermi C2050.