Cui Z, Jiang T. No-Reference Video Shakiness Quality Assessment, in Computer Vision - ACCV 2016 - 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part V.Vol 10115. Springer; 2016:396–411. 访问链接
Luo G, Zhang W, Zhang J, Cong J. Scaling Up Physical Design, in Proceedings of the 2016 on International Symposium on Physical Design - ISPD '16. New York, New York, USA: ACM Press; 2016:131–137. 访问链接
As a traditional application on various supercomputers, atmospheric modeling has long been suffering from the low performance efficiency. In this paper, we pick the 3D Euler equation solver (the most essential dynamic component for a non-hydrostatic atmospheric model) as the target application, and explore the maximum performance efficiency that can be achieved on CPU-GPU hybrid architectures. Besides presenting the suitable hybrid domain decomposition methodology and taking proper usage of tuning techniques for both the CPU and GPU parts, we further propose a novel GPU tuning technique, namely the customizable data caching mechanism with thread warp rescheduling scheme, which is specifically designed for the Euler solver. Combining all the optimizing approaches together, remarkable performance boost has been achieved on mainstream GPU architectures including Tesla Fermi C2050, K20×, K40 and K80. Especially, on the latest Tesla K80, we demonstrate a 31.64× speedup over the performance of 12-core E5-2697 CPU. In addition, based on a hybrid CPU-GPU node with two 12-core E5-2697 CPUs and two Tesla K80 GPUs, a sustained double-precision performance of 1.04 Tflops (16% of the peak) is achieved, which is remarkably higher than the efficiency of similar optimizing tasks based on heterogeneous platforms (strictly less than 10%, as demonstrated in the related work). In addition, a nearly linear weak scaling efficiency is achieved which demonstrate the effectiveness of our domain decomposition method.