Arm NN is an open-source inference engine for CPUs, GPUs and NPUs. It bridges the gap between existing NN frameworks and the underlying IP. Arm NN is built on top of the Arm Compute Library (ACL). This contains a collection of highly optimized low-level functions to accelerate inference on the Arm Cortex-A family of CPU processors and the Arm Mali family of GPUs. For GPUs, ACL uses OpenCL as its compute API. The OpenCL memory model closely maps to the GPU architecture making it possible to implement optimizations that significantly reduce the accessing of global memory. Read on to learn how.
The Intel OpenCL Intercept Layer is one of the company’s efforts around helping to improve debugging and analyzing of OpenCL application performance. This cross-platform layer intercepts the OpenCL API calls through the OpenCL ICD loader to analyze/debug CL applications. With the OpenCL Intercept Layer 3.0 release, it has full support for tracing all OpenCL 3.0 APIs. The update also allows for tracing more vendor-specific CL extensions, proper handling of extension APIs from multiple platforms, emulated support for unified shader memory via shared virtual memory, and a number of other enhancements including bug fixes and performance improvements.
Today, the Khronos OpenCL Working Group is happy to announce the release of the finalized OpenCL 3.0 specifications, including a new unified OpenCL C 3.0 language specification, together with an early initial release of a Khronos OpenCL SDK to enable developers to quickly get up to speed using OpenCL.
Radeon Open Compute 3.7 has an open-source OpenCL Image implementation. With previous releases, a binary-only libhsa-ext-image64.so library was required for OpenCL Image support with the ROCm stack. But quietly with the new ROCm 3.7 release, they added the source code as part of the ROCR run-time.
TensorFlow team announces the official launch of OpenCL-based mobile GPU inference engine for Android, which offers up to ~2x speedup over existing OpenGL backend, on reasonably sized neural networks that have enough workload for the GPU.
Intel has released oneAPI DPC++ Compiler 2020-05 as their latest snapshot for the current state of their LLVM-based Data Parallel C++ Compiler. With this release comes various SYCL front-end and driver improvements and the OpenCL ahead-of-time compilation tool is now included in their sycl-toolchain target, and much more.
This year IWOCL & SYCLcon 2020 had a record number of high quality submissions in all categories; research papers, technical presentations, tutorials and posters. These video presentations from the IWOCL & SYCLcon 2020 program of papers, tech. presentations and posters are now online.
The Khronos Group publicly releases the OpenCL 3.0 Provisional Specifications. OpenCL 3.0 realigns the OpenCL roadmap to enable developer-requested functionality to be broadly deployed by hardware vendors, and it significantly increases deployment flexibility by empowering conformant OpenCL implementations to focus on functionality relevant to their target markets. OpenCL 3.0 also integrates subgroup functionality into the core specification, ships with a new OpenCL C 3.0 language specification, uses a new unified specification format, and introduces extensions for asynchronous data copies to enable a new class of embedded processors. The provisional OpenCL 3.0 specifications enable the developer community to provide feedback on GitHub before the specifications and conformance tests are finalized.
Today, The Khronos® Group, an open consortium of industry-leading companies creating advanced interoperability standards, publicly releases the OpenCL™ 3.0 Provisional Specifications. OpenCL 3.0 realigns the OpenCL roadmap to enable developer-requested functionality to be broadly deployed by hardware vendors, and it significantly increases deployment flexibility by empowering conformant OpenCL implementations to focus on functionality relevant to their target markets. OpenCL 3.0 also integrates subgroup functionality into the core specification, ships with a new OpenCL C 3.0 language specification, uses a new unified specification format, and introduces extensions for asynchronous data copies to enable a new class of embedded processors. The provisional OpenCL 3.0 specifications enable the developer community to provide feedback on GitHub before the specifications and conformance tests are finalized.
The 8th International Workshop on OpenCL, SYCL, Vulkan and SPIR-V starts today, April 27th 2020, and will be a digital only event. The complete conference program is online showing first up SYCL Tutorials with ‘An Introduction to SYCL’ presented by Codeplay, Heidelberg University, Intel and Xilinx. Registration is free. Listen now to Michael Wong, SYCL Working Group Chair give a SYCL State of the Union, with slides and video.
The Folding@Home non-profit organization has created the world’s fastest supercomputer from volunteers loaning spare time on their home PCs to fold proteins, a task that could prove instrumental in the fight against the coronavirus. Scientists are using this enormous amount of compute power to simulate viral proteins in an effort to reveal new coronavirus therapeutic treatments.
Folding@Home uses the Khronos OpenCL™ open standard for parallel programming to offload computations onto the GPUs contained in the networked home PCs that are often used for gaming – significantly boosting available compute power.
According to Folding@Home, the combined power of its network broke 1,000,000,000,000,000,000 operations per second – or one “exaflop” – on 25 March, making it the world’s fastest supercomputer. In fact, it is six times more powerful than the current world’s fastest traditional supercomputer, the IBM Summit, which is used for scientific research at the US’s Oak Ridge National Laboratory. By April 13, it had more than doubled that, hitting a new record of 2.4 exaflops, faster than the top 500 traditional supercomputers combined, thanks to almost 1 million new members of the network (Source: The Guardian).
Codeplay looks forward to IWOCL every year since the conference is laserfocused on two of their favorite topics - OpenCL and SYCL. This year they are excited to be part of the first co-hosted IWOCL and SYCLcon, with SYCL bringing a full track of presentations to the event. Learn more and join Codeplay, Khronos and many others online.
PoCL is a portable open source (MIT-licensed) implementation of the OpenCL standard (1.2 with some 2.0 features supported). In addition to being an
easily portable multi-device open-source OpenCL implementation, a major goal of this project is improving interoperability of diversity of
OpenCL-capable devices by integrating them to a single centrally orchestrated platform. Upstream PoCL currently supports various CPUs, NVIDIA GPUs via libcuda, HSA-supported GPUs and TCE ASIPs (experimental, see: http://openasip.org) It also is known to have multiple (private) ports. 1.5 release adds support for Clang/LLVM 10.0, easy to use kernel profiling features and plenty of fixes and performance improvements.