Many-core GPUs are bringing tera-scale computing to the desktop and petascale computing to clusters. This new level of computing performance will enable individuals to solve a new class of complex problems. But to fully leverage the power of these processors, users need in-depth knowledge of parallel programming principles, parallelism models, communication models, and the limitations of these resources. This tutorial is organized with an introduction to the CUDA/OpenCL programming models, algorithm and performance tuning techniques, and a case study. The tutorial is designed to give you a foundational understanding of many-core parallel programming skills so you're prepared to exploit all the exciting new potential in massively parallel systems for your applications.
Level of Tutorial
We'll be assuming C/C++ programming skills and conceptual knowledge of parallel software (not necessarily data-parallel programming models). People with a basic computer architecture background (e.g. knowledge of registers, SIMD) will have an advantage in grasping some of the optimization principles discussed.