The governments of Canada and Alberta are implementing a joint plan for oil sands monitoring that includes investigating emissions, transport and downwind chemistry associated with the Canadian oil sands region. As part of that effort, Environment Canada's Global Environmental Multiscale-Modelling Air-quality And CHemistry (GEM-MACH) system was reconfigured for the first time to create nested forecasts of air quality at model grid resolutions down to 2.5 km, with the highest resolution domain including the Canadian provinces of Alberta and Saskatchewan. The forecasts were used to direct an airborne research platform during a summer 2013 monitoring intensive. Subsequent work with the modelling system has included an in-depth comparison of the model predictions to monitoring network observations, and to field intensive airborne and surface supersite observations. A year of model predictions and monitoring network observations were compared, as were model and aircraft flight track values. The relative impact of different model versions (including modified emissions and feedbacks between weather and air pollution) will be discussed. Model-based predictions of indicators of human-health (i.e., Air Quality Health Index) and ecosystem (i.e. deposition of pollutants) impacts for the region will also be described.
Luo G, Zhang W, Zhang J, Cong J. Scaling Up Physical Design, in Proceedings of the 2016 on International Symposium on Physical Design - ISPD '16. New York, New York, USA: ACM Press; 2016:131–137. 访问链接
As a traditional application on various supercomputers, atmospheric modeling has long been suffering from the low performance efficiency. In this paper, we pick the 3D Euler equation solver (the most essential dynamic component for a non-hydrostatic atmospheric model) as the target application, and explore the maximum performance efficiency that can be achieved on CPU-GPU hybrid architectures. Besides presenting the suitable hybrid domain decomposition methodology and taking proper usage of tuning techniques for both the CPU and GPU parts, we further propose a novel GPU tuning technique, namely the customizable data caching mechanism with thread warp rescheduling scheme, which is specifically designed for the Euler solver. Combining all the optimizing approaches together, remarkable performance boost has been achieved on mainstream GPU architectures including Tesla Fermi C2050, K20×, K40 and K80. Especially, on the latest Tesla K80, we demonstrate a 31.64× speedup over the performance of 12-core E5-2697 CPU. In addition, based on a hybrid CPU-GPU node with two 12-core E5-2697 CPUs and two Tesla K80 GPUs, a sustained double-precision performance of 1.04 Tflops (16% of the peak) is achieved, which is remarkably higher than the efficiency of similar optimizing tasks based on heterogeneous platforms (strictly less than 10%, as demonstrated in the related work). In addition, a nearly linear weak scaling efficiency is achieved which demonstrate the effectiveness of our domain decomposition method.