Following the release of OpenCL™ 3.0 in September 2020, The Khronos® Group continues to expand and grow the ecosystem of this open, royalty-free standard for cross-platform, parallel programming of diverse accelerators found in supercomputers, cloud servers, personal computers, mobile devices, and embedded platforms.
For example, in just the past year, Khronos has released extensions to enhance OpenCL’s performance and flexibility across multiple platforms, including:
- Enhanced subgroup functionality
- Extended bit-level operations
- Queries for a device universally unique identifier
- Enhanced queries for platform and device versions
- Queries for PCI bus information
- SPIR-V support for C++ linkage types
- Queries for a suggested local work size
Now, the OpenCL Working Group is announcing new extensions for two key use cases: boosting neural network inferencing performance, and providing flexible and powerful interoperability with new-generation graphics APIs, including Vulkan.
Integer Dot Product Extension for Neural Network Inferencing
As machine learning technology evolves, neural network inferencing is often optimized to use integer rather than 32-bit or 16-bit floating point data. Narrower data types save memory and bandwidth, and simpler integer operations increase processing throughput. Many modern neural network models are designed for inferencing using 8-bit integers, and modern hardware accelerators provide dedicated support to efficiently execute 8-bit integer dot product operations, leading to significant improvements in inferencing performance and power consumption.
Until now, it was not possible for developers to write cross-vendor, portable code to leverage this new generation of hardware capabilities in diverse accelerator architectures. The new OpenCL cl_khr_integer_dot_product extension adds support for SPIR-V instructions and OpenCL C built-in functions to compute the dot product of vectors of integers, making it possible for OpenCL applications and libraries to target optimized inferencing acceleration in a variety of processor types from multiple hardware vendors.
This new integer dot product extension has similar capabilities to the existing Vulkan® VK_KHR_shader_integer_dot_product extension, and is an important step towards faster, more portable machine learning acceleration in multiple Khronos APIs - and there are more developments in the pipeline. Stay tuned!
Semaphore and Memory Sharing Extensions for Vulkan Interop
Developers often use additional APIs alongside OpenCL, such as OpenGL®, to access functionality such as graphics rendering. OpenCL has long enabled the sharing of implicit buffer and image objects with OpenGL, OpenGL ES, EGL, Direct3D 10, and Direct3D 10 through extensions such as cl_khr_gl_sharing, cl_khr_gl_event, cl_khr_egl_image, cl_khr_egl_event, cl_khr_d3d10_sharing, and cl_khr_d3d11_sharing.
However, new generation GPU APIs such as Vulkan use explicit handles to external memory together with semaphores to coordinate access to shared resources. Until now, there have been no OpenCL extensions to enable external memory sharing with this new class of API—leading to a strong demand from both mobile and desktop developers, particularly for interop between OpenCL and Vulkan.
Now the OpenCL Working Group has released a set of extensions to enable applications to efficiently share data between OpenCL and APIs such as Vulkan—with significantly increased flexibility compared to previous generation interop APIs using implicit resources.
This set of new External Memory Sharing extensions provides a generic framework that enables OpenCL to import external memory and semaphore handles exported by external APIs—using a methodology that will be familiar to Vulkan developers—and then use those semaphores to synchronize with the external runtime, coordinating the use of shared memory.
External-API-specific interop extensions are then added to handle the details of interacting with particular APIs. OpenCL interop with the Vulkan and DX12 APIs is available today, and support for additional APIs will be added in the future.
OpenCL’s new External Memory Sharing functionality includes two sets of carefully structured extensions for handling semaphores and external memory.
This set of extensions adds the ability to create OpenCL semaphore objects from OS-specific semaphore handles.
- cl_khr_semaphore - a new class of OpenCL object to represent semaphores with wait and signal
- cl_khr_external_semaphore - extends cl_khr_semaphore with mechanisms for importing and exporting external semaphores (similar to VK_KHR_external_semaphore). The related extensions below extend cl_khr_external_semaphore with handle-type-specific behavior:
- cl_khr_external_semaphore_opaque_fd (similar to VK_KHR_external_semaphore_fd) for sharing external semaphores using linux fd handles with reference transference
- cl_khr_external_semaphore_win32 (similar to VK_KHR_external_semaphore_win32) for sharing external semaphores using win32 NT and KMT handles with reference transference
- cl_khr_external_semaphore_sync_fd for sharing external semaphores using sync_fd handles with copy transference
- cl_khr_external_semaphore_dx_fence (similar to the D3D12 fence in VK_KHR_external_semaphore_win32) for sharing external semaphores using D3D12 fences.
External Memory Extensions
These extensions add the ability to create OpenCL memory objects from OS-specific memory handles. They have a similar design to the Vulkan external memory extension VK_KHR_external_memory.
- cl_khr_external_memory - imports external memory from other APIs. The related extensions below extend cl_khr_external_memory with handle-type-specific behavior:
- cl_khr_external_memory_opaque_fd (similar to VK_KHR_external_memory_fd) for sharing external memory using linux fd handles
- cl_khr_external_memory_win32 (similar to VK_KHR_external_memory_win32) for sharing external memory using win32 NT and KMT handles
- cl_khr_external_memory_dx for sharing external memory using win32 D3D12 heap and resource handles
- cl_khr_external_memory_dma_buf for sharing external memory using dma_buf handles.
The set of External Memory Sharing extensions has been released provisionally for developer feedback before finalization. Feedback is welcomed via GitHub here: https://github.com/KhronosGroup/OpenCL-Docs/.
Industry Support for New OpenCL Extensions
“Arm is committed to providing seamless developer access to high-quality compute technologies and the machine learning performance required to unlock new and advanced applications. Together with the Khronos OpenCL working group, we’ve spearheaded the extension of the OpenCL standard, taking heterogenous compute applications to the next level of performance for years to come.”
“Imagination, as a pioneer in compute processing technologies, and a strong supporter of the OpenCL ecosystem, has always been a fast adopter of new OpenCL capabilities that unlock new possibilities for our customers across a wide range of markets. Once again, we have collaborated in the OpenCL working group to introduce new extensions for our application developers. We are particularly excited about the semaphore extensions that bring new levels of efficiency in OpenCL interoperability. A wide variety of machine learning, and other applications with heterogeneous computation requirements, will benefit greatly from these extensions, which we already optimally support.”
“Now that we have shipped conformant OpenCL 3.0 drivers, NVIDIA is committed to evolving significant new functionality to benefit all OpenCL developers. We have created the External Semaphore and Memory Sharing extensions together with the OpenCL Working Group for efficient interop with new generation APIs such as Vulkan, adding to the deployment flexibility already available through OpenCL’s interoperability with OpenGL.”
Drivers Coming to a Device Near You!
OpenCL drivers including both Integer Dot Product and External Memory Sharing extensions will soon be shipping from hardware vendors. We will post updates here as soon as they are available. We can’t wait to hear what you think!