As a traditional application on various supercomputers, atmospheric modeling has long been suffering from the low performance efficiency. In this paper, we pick the 3D Euler equation solver (the most essential dynamic component for a non-hydrostatic atmospheric model) as the target application, and explore the maximum performance efficiency that can be achieved on CPU-GPU hybrid architectures. Besides presenting the suitable hybrid domain decomposition methodology and taking proper usage of tuning techniques for both the CPU and GPU parts, we further propose a novel GPU tuning technique, namely the customizable data caching mechanism with thread warp rescheduling scheme, which is specifically designed for the Euler solver. Combining all the optimizing approaches together, remarkable performance boost has been achieved on mainstream GPU architectures including Tesla Fermi C2050, K20×, K40 and K80. Especially, on the latest Tesla K80, we demonstrate a 31.64× speedup over the performance of 12-core E5-2697 CPU. In addition, based on a hybrid CPU-GPU node with two 12-core E5-2697 CPUs and two Tesla K80 GPUs, a sustained double-precision performance of 1.04 Tflops (16% of the peak) is achieved, which is remarkably higher than the efficiency of similar optimizing tasks based on heterogeneous platforms (strictly less than 10%, as demonstrated in the related work). In addition, a nearly linear weak scaling efficiency is achieved which demonstrate the effectiveness of our domain decomposition method.
Wang S, Gu K, Ma S, Lin W, Zhang X, Gao W. Cloud Based Image Contrast Enhancement, in 2015 IEEE International Conference on Multimedia Big Data, BigMM 2015, Beijing, China, April 20-22, 2015.; 2015:148–155. 访问链接
Shen X, Zou L, Özsu TM, Chen L, Li Y, Han S, Zhao D. A graph-based RDF triple store, in 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015.; 2015:1508–1511. [pdf]