LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014)
Approaches to the design challenges of power-efficient GPU and many-core computing will be addressed and include topics such as:
Heterogeneous Many-core Architectures including Mobile and Embedded Platforms
GPU Programming Models, APIs, Languages, Tools and Compilers
Low-Power Application Case Studies and Performance Evaluations
“Green” High Performance Computing
OpenCL Related talks
OpenCL heterogeneous portability – theory and practice
Speaker: Ayal Zaks (Intel Corporation, Israel)
Abstract: OpenCL is designed to provide portability across heterogenous devices. Yet some aspects of portability are still challenging. We examine several current trends in research and industry striving to further improve OpenCL’s support for heterogenous portability.
Performance portability for embedded GPUs
Speaker: Simon McIntosh-Smith (Head of the microelectronics group at the University of Bristol, UK)
Abstract: As the range of embedded GPUs from ARM, Imagination, Qualcomm, Nvidia and others grows, the challenge for software developers also grows. Developing GPU computing applications which are not only functionally portable, but also performance portable, across this diverse range of GPUs becomes a critical issue which must be solved in order for software developers to more broadly adopt embedded GPU computing. The wide range in scale of the performance of embedded GPUs further complicates the issue. In this talk we will look at some recent work at the University of Bristol which exploits OpenCL to evaluate performance portability across embedded GPU platforms and across GPUs ranging from a few GFLOPS to 100 GFLOPS.
An Insight into the Insieme Compiler Automatic Partitioning for Heterogeneous Platforms
Speaker: Biagio Cosenza (University of Innsbruck, Austria)
Abstract: Unleashing the full potential of heterogeneous systems, consisting of multi-core CPUs and GPUs, is a challenging task due to the difference in processing capabilities, memory availability, and communication latencies of different computational resources. The Insieme Compiler manages these differences by deriving a prediction model based on machine learning (ANN and SVM) which incorporates static program features as well as dynamic, input sensitive features. This talk describes how this approach has been used to perform automatic input-sensitive task partitioning of OpenCL programs and discusses the energy vs performance trade-off on similar platforms.
Parallel H.264/AVC Motion Compensation for GPUs using OpenCL
Speaker: Biao Wang (Technische Universität Berlin)
Abstract: Motion Compensation (MC) is one of the most compute-intensive parts in H.264/AVC video decoding. It exposes massive parallelism which can reap the benefit from Graphics Processing Units (GPUs). However, the divergence caused by different interpolation modes in MC can lead to significant performance penalty on GPUs. In this work, we propose a novel multi-stage approach to parallelize the MC kernel for GPUs using OpenCL. The proposed approach mitigates the divergence by exploiting the fact that different interpolation modes share common computation stages. In addition, the optimized kernel has been integrated into a ffmpeg decoder that supports H.264/AVC high profile. We evaluated our kernel on GPUs with different architectures shipped by AMD, Intel, and Nvidia. Compared to a CPU implementation, our kernel achieves maximum speedups of 3.27 and 3.59 for 1080p and 2160p videos, respectively. Furthermore, we applied zero copy optimization for integrated GPUs from AMD and Intel to eliminate memory copy overhead between CPU and GPU.
Nema3D: An OpenGL/OpenCL Embedded Programmable Engine
Speaker: Georgios Keramidas and Iakovos Stamoulis (Think Silicon Ltd.)
Abstract: Nema3D is the new programmable core designed by Think Silicon Ltd. ( www.think-silicon.com). Nema3D is a multithreaded processing core powered by an intelligence software-hardware codesign approach, in house compiler support (LLVM-based), highly reconfigurable architecture, and new low power architectural-level techniques. Nema3D is designed to support the latest API standards of the Khronos group (OpenGL ES 3.0 and OpenCL 1.2) in the same silicon footprint, while featuring image and vision acceleration capabilities is under investigation. As part of this presentation, the design philosophy and the architectural organization of the Nema3D core will be outlined.
Fusing GPU kernels with a novel single-source C++ API
Speaker: Ralph Potter (Codeplay Software Ltd. / University of Bath, UK)
Abstract: Ongoing and rapidly maturing compiler and API research by Codeplay aims to provide a higher-level, single-source, industry-focused C++-based interface to OpenCL. We investigate opportunities for compiler-based kernel fusion utilizing features from C++11 including lambda functions; variadic templates; and lazy evaluation using std::bind expressions.