SYCL 2020 - What you need to know

Today, Khronos released a major update to SYCL with the final SYCL 2020 specification, marking years of specification development, industry feedback, and evolution of the standard to bring valuable new features and greater alignment with ISO C++. As part of the announcement, we are also sharing the increased adoption and expansion of SYCL implementations that have been released in the past year. SYCL 2020 adds significantly more features and fixes to the specification for easier and faster integration and parallel features with C++17. Based on C++17, SYCL 2020 simplifies interoperability and extends C++ with heterogeneous capabilities for higher performance and greatly improved programmability for developers.

Want to know more about SYCL and what’s new with SYCL 2020? We put together our hot list of frequently asked questions to give you more insight into SYCL, how to use it, and its benefits to developers. This list will be updated as new questions and releases appear.

1. What is SYCL?

SYCL is an open standard specification published by the Khronos Group. It defines a single-source C++ programming layer that allows developers to use modern C++ features on a range of heterogeneous devices. Leveraging support from OpenCL and other backends, SYCL enables parallel execution on a range of hardware including CPUs, GPUS, DSPs, FPGAs, AI and custom chips providing a foundation for creating efficient, portable and reusable middleware libraries and applications.

SYCL Single Source C++ Parallel Programming

2. Why should I use SYCL?

SYCL is a modern take on a programming model for heterogeneous and parallel hardware platforms and is currently used in HPC, AI/ML, Autonomous Vehicles, and embedded or customized devices. Developed by a consortium of industry, research, and academic leaders, it combines decades of experience and lessons learned from earlier approaches. The SYCL API is closely aligned with the ISO C++ specification and follows modern programming paradigms. It features high-level abstractions, easing many of the burdens commonly encountered in parallel programming, while still allowing for fine-grained control over performance and hardware features. Finally, SYCL represents an open collaborative effort to bring a single, unified programming model to a wide variety of hardware platforms and vendors.

3. What hardware can I target using SYCL?

The SYCL programming model is designed with a wide variety of hardware platforms in mind, ranging from accelerators, GPUs, DSPs and FPGAs to AI/ML and custom chipsets. The list of platforms supported by the existing SYCL implementations is continuously growing, and a snapshot from February 2021 of the currently supported hardware ecosystem can be seen below. New entries will be added as they appear with actual implementations.

SYCL Implementations in Development

4. What is SYCL 2020?

SYCL 2020 is the newest release of the SYCL specification, ratified by the Working Group in late 2020, published by the Khronos Group in early 2021. It follows SYCL 1.2.1, the last version to be based directly on OpenCL (see also question 10: Does SYCL require OpenCL?). Previous release followed OpenCL base release and in light of the move to a more generalized backend model as well as following ISO C++’s release based on year, the project has adopted a year-based versioning scheme.

SYCL 2020 represents a major step forward, featuring over 40 new additions and improvements, including:

  • Unified Shared Memory (USM) enables code with pointers to work naturally without buffers or accessors
  • Parallel reductions add a built-in reduction operation to avoid boilerplate code and achieve maximum performance on hardware with built-in reduction operation acceleration
  • Work group and subgroup algorithms add efficient parallel operations between work items
  • Class template argument deduction (CTAD) and template deduction guides simplify class template instantiation
  • Simplified use of Accessors with a built-in reduction operation reduces boilerplate code and streamlines the use of C++ software design patterns
  • Expanded interoperability enables efficient acceleration by diverse backend acceleration APIs
  • SYCL atomic operations are now more closely aligned to standard C++ atomics to enhance parallel programming freedom

SYCL 2020 increases expressiveness and simplicity for modern C++ heterogeneous programming.

5. What does SYCL stand for?

SYCL (pronounced "sickle") is an original name and not an acronym.

6. Where can I ask questions about SYCL?

The Khronos official forum is a great place to ask questions about SYCL and is the official place.

The sycl.tech website shows lots of different ways of finding out more about SYCL.

7. Where can I find benchmarks and other performance information about SYCL?

There are various research papers and projects that have been published, and several projects that continue to be developed offering benchmarks across different programming frameworks and platforms. There are some links to these projects and papers on the sycl.tech website.

There is also work done by Bristol University exploring the performance of SYCL. Good places to start are their latest study on performance portability, their evaluation of many programming models for memory bandwidth bound codes and their study looking at the performance of SYCL codes on GPUs from Intel, AMD and NVIDIA (watch the video).

8. How can I suggest new features or changes to the specification?

The SYCL specifications are open source and maintained through this GitHub project. Anyone can raise an issue on the project with a proposed change or addition to the specification and it will be discussed by the SYCL working group.

9. Can SYCL work without a target device such as a GPU or FPGA?

Yes. All existing SYCL implementations support a device that runs on the host CPU, so you don’t need a target offload device in order to use SYCL.

10. Does SYCL require OpenCL?

No. Earlier versions of SYCL were closely aligned with OpenCL, including special functionality for interoperability with OpenCL APIs. However, SYCL 2020 has transitioned to a generalized backend model, making OpenCL just one of many different potential programming models that SYCL can be built upon. Implementations based on other backends, for example OpenMP, CUDA and Level Zero, are already available.

11. Can you still use vendor specific extensions in SYCL?

Yes. Many implementers provide a range of vendor extensions for SYCL.

When using the OpenCL backend for SYCL, any vendor specific extensions that are available in OpenCL are still available in SYCL. All OpenCL intrinsics can be used within SYCL kernels just as developers would do in an OpenCL kernel. Portability and host execution can be ensured using preprocessor definitions.

12. Can I port my existing OpenCL kernels to SYCL?

Yes. SYCL provides two ways in which developers can port existing OpenCL C kernel functions.

Firstly, SYCL has an OpenCL C host-side target, which gives all the benefits of SYCL such as memory management and dependency tracking whilst allowing the kernel functions to be defined by traditional OpenCL C kernels.

Secondly, SYCL has full interoperability with OpenCL meaning that at any point in a SYCL application, the equivalent OpenCL object can be retrieved from any SYCL object, allowing the developers to use it with traditional OpenCL API functions. Additionally, SYCL objects feature constructors which take OpenCL objects, however in some cases the developer is responsible for maintaining consistency between OpenCL objects and SYCL objects.

13. Is there an overhead when using SYCL compared to OpenCL?

The SYCL specification has been designed so that the overhead posed by the SYCL host-side runtime over the underlying OpenCL implementation is minimal and that there is no overhead to the kernel functions themselves. The research and benchmarks page on sycl.tech includes a paper showing the comparable results of running a PRK stencil using OpenCL and SYCL. SYCL offers higher abstraction and more expressiveness as it is based on modern C++.

14. Is there anything you can do with OpenCL you can’t do with SYCL?

No, SYCL provides the opportunity for interoperability between OpenCL and SYCL and the added advantages of a scheduler and type safety.

15. What C++ features does SYCL provide?

SYCL 2020 is based on ISO C++17 but implementations are free to provide even more recent versions of C++ like C++20.

One of the major benefits that SYCL provides is to enable developers to write kernels using a subset of standard C++ features.

Examples of standard C++ features supported in the current specification are templates, classes, operator overloading, static polymorphism and lambdas. SYCL 2020 further enables Class Template Argument Deduction (CTAD), and deduction Guides from C++17.

16. Which C++ language features does SYCL not support?

SYCL does not impose any restrictions on the C++ language features that can be used on the host. For device kernels, certain restrictions apply just like most other languages for devices. These include dynamic polymorphism (aka virtual functions and inheritance), function pointers, dynamic memory allocation, exceptions, and runtime type information as their support cannot be guaranteed by the hardware on some devices.

However, by using the modern features of C++ there are alternative ways to achieve the same thing.

A way to achieve dynamic polymorphism in SYCL is described in this blog post or you can use for example some C++17 std::variant with std::visit.

If you use function pointers this blog post will explain how to adapt your SYCL code.

Dynamic memory allocation cannot be used.

Exceptions in SYCL can be both synchronous and asynchronous and are handled using an exception handler. An explanation of how to implement this in SYCL can be found in this guide.

17. Why does SYCL not support virtual functions?

SYCL host code is just standard C++ without restrictions and virtual methods can still be called statically on the device.

As OpenCL doesn't support calling functions through pointers inside device code, when SYCL targets OpenCL, any code that is compiled for the device cannot contain calls to function pointers and non-static calls to virtual functions. Like most offload accelerator languages, SYCL device is essentially limited by the capabilities of the device, so if the device accelerator supported calling device functions through pointers it would technically be possible to support that in SYCL. A possible implementation in SYCL could take advantage of using an offload/mapping mechanisms that Codeplay used in its Offload toolchain for PlayStation 3 and described in the paper https://dl.acm.org/citation.cfm?id=2174856. Another possibility is to use the Curious Recurring Template Pattern (CRTP) or C++17 std::variant with std::visit.

18. Does SYCL require C++ lambdas and templates?

Some posts seem to say SYCL requires these features and struggle with them. SYCL can be used without lambdas (use functors instead - already says this somewhere but should be separate question that also answers when and why templates are needed

The SYCL API makes substantial use of templates which enable an efficient static compilation and integration of host and device code whilst abstracting the implementation details. For example, the buffer/accessor templates enable the access of typed host data from the SYCL kernel. The parallel_for function template provides an interface to launch a functor as a SYCL kernel on the SYCL device. The SYCL kernel can be non-templated C++ code, but templates are a powerful tool to generate efficient and terse device code. For example, algorithm templates can be instantiated with different configuration options as template arguments, enabling more optimizations. This pattern is used by SYCL libraries like SYCL-BLAS.

C++ lambdas are not required by SYCL, but they are very convenient for creating the kernel functor objects as the data accessed by the lambda function is automatically captured. The alternative is writing a separate functor class which tends to be more verbose than the equivalent C++ lambda.

Compared to SYCL 1.2.1, SYCL 2020 provides a terser syntax based on C++17 using for example CTAD (class template argument deduction) and deduction guides, avoiding a lot of template.

Also using the auto keyword while you program allows code simplification with less template keywords.

19. What options does SYCL provide for parallelism?

SYCL uses a kernel-based programming model. As there are lots of different requirements for expressing parallelism in applications, SYCL provides four ways in which a kernel function can be executed. They are ordered from highest to lowest levels of abstractions.

Basic Data Parallel: a kernel function is executed with a single range specifying the total amount of parallel work to perform. The work is defined by a kernel, expressed as a lambda function or a functor. Synchronization between the parallel work in the range is forbidden, although some devices provide support for atomic operations.

Work Group Data Parallel: a kernel function is executed using an "nd range". An nd range specifies a 1, 2 or 3 dimensional grid of work items that each executes the kernel function, which are executed together in work groups. The nd_range consists of two 1, 2 or 3 dimensional ranges: the global work size (specifying the full range of work items; the total amount of parallel work) and the local work size (specifying the number of work-items in each work group). In this execution mode, synchronization within a group can be performed using barriers.

Hierarchical Data Parallel: we do not recommend this in SYCL 2020 (because we are working on an even better solution… stay tuned). A kernel function executes in a work group data parallel way, but SYCL provides an alternative multi leveled syntax for defining this form of parallelism. The hierarchical syntax consists of an outer parallel for work group loop that is executed for each work group in the nd range and an inner parallel for work item loop that is executed for each work item in the work group. The hierarchical syntax is a clearer way of writing parallel OpenCL code as it highlights the nature of the parallelism.

Single Task: a (serial) kernel function is executed just once, this is effectively the same as executing an nd range of global work size { 1, 1, 1 }, for example to launch a native AI kernel or launch an FPGA kernel where you want to handle the loop-nests by yourself.

In addition, SYCL also provides developers with a task-based programming model for expressing concurrency between kernels. By default, all kernels are submitted to an out-of-order queue with the ordering of tasks determined by data dependencies expressed using assessors. For SYCL applications using Unified Shared Memory (instead of the Buffer/Accessor model), the task-model is still available with the dependencies between kernels expressed using Events.

20. How does SYCL handle memory management?

SYCL has two approaches to sharing data between the host and device.

Firstly, memory is encapsulated within a higher-level abstraction that separates the storage and access of data. Just like in OpenCL, buffer and image objects are used to maintain data that is to be enqueued to a device, however in SYCL, a buffer or image object can maintain multiple OpenCL buffers and images and en-queuing is performed by accessor objects. This allows the host-side runtime to perform dependency tracking between kernel functions and therefore provide better synchronization.

Secondly, USM enables a pointer-based alternative to the buffer programming model. This makes it easier to integrate into existing code by representing allocations as pointers with full support for pointer arithmetic into allocations. There is also fine-grain control over ownership and accessibility of allocations. It is a simpler programming model that automatically migrates some allocations between SYCL devices and the host.

21. How does SYCL handle the execution of work?

SYCL maintains a task graph at runtime to coordinate scheduling of parallel work based on data or event dependencies.

In SYCL, programmers can define a kernel and its dependencies inside command groups. A command group is defined either as a functor (class or struct) or lambda and that takes a SYCL handler object that manages the parallel execution. It is an object defined during run-time and is used to set the input and output accessor objects through which SYCL manages device memory.

Memory in SYCL is represented as sycl::buffer and sycl::image objects which get mapped to accessors, or via Unified Shared Memory, for use in SYCL kernels. Sharing this memory through accessors or by specifying event dependencies between enqueue commands in the command group is a sign for the need of synchronization. If all commands happen in the same memory space (global, local, private), then explicit synchronization is not needed and SYCL will handle the data movement. However, may we include, for example, local memory accessors, we have to do implicit synchronization between the commands in the different memory spaces (e.g. global → local → global). Such synchronization can be performed by using sycl::barrier and sycl::mem_fence for the correct memory space (global_space, local_space, or global_and_local).

22. Can I use SYCL in my existing tool chain?

Yes. SYCL is designed to be as flexible as possible, when it comes to integration within new or existing tool chains. In the case of a multi-compiler workflow with a separate compiler for host and device, any C++ host compiler can be integrated with a SYCL device compiler providing the SYCL implementation supports it. Depending on the implementation, it can also be library-only and fully following CMake integration.

23. How can I debug SYCL code?

SYCL kernel functions are standard C++, therefore a SYCL kernel function can execute on the host CPU allowing developers to use traditional debuggers to debug their code. Implementers of SYCL may also provide additional debugging capabilities or tools.

24. Who can develop a SYCL implementation?

SYCL is a royalty-free open standard supported by a rich ecosystem of vendor and community support with extensive resources for programming applications in SYCL. Anyone can develop an implementation of SYCL, but if a company wishes to have the protection of the Khronos IP Framework and use the SYCL name and logo on their implementation they must be SYCL Adopters and become conformant under the SYCL Adopters Program.

25. What are oneAPI and DPC++, and how do they relate to SYCL?

Intel is developing the oneAPI platform that offers a range of libraries, tools and programming models to help developers write code that can be deployed across a wide range of hardware such as CPUs, GPUs, FPGAs and AI processors. Part of this platform is Data-Parallel C++ (DPC++) which is an open-source implementation of the SYCL standard plus some custom vendor extensions. What this means is that if you write your code using standard SYCL, it will compile with any SYCL compiler, including DPC++.

Authored by

Comments

devilish