SYCL - The Dawn of a Unified Programming Model for Heterogeneous Modern C++ at SC19

In this guest blog, Michael Wong, Chair of the SYCL Working Group and Vice President of Research and Development at Codeplay Software Ltd, reflects on the evolution of SYCL in the past two years.

My name is Michael Wong, and in this blog I will talk about SYCL™, the Khronos® Group’s open standard for programming heterogeneous processors in “single-source” standard C++ and the SYCL working group’s activities. I have had the pleasure of chairing SYCL for the last four years, taking over from Codeplay’s Andrew Richards, shepherding a group of insanely talented people from many companies who are driving forward the technology of heterogeneous, modern C++. In this blog, I’ll tell you about my experience at SC19 with SYCL and Intel’s oneAPI that implements the SYCL standard. In future blogs, I would like to tell you more about SYCL features and future directions.

Typical Use of SYCL for Single Source C++ Parallel Programming

SC19 was my twelfth trip to the International Conference for High Performance Computing, Networking, Storage and Analysis, so I am no stranger to these events. I recall my first event was SC08, when I went representing IBM and OpenMP. You can check out some of my early blog posts about this, some of which I am almost embarrassed to re-read now eleven years later:

Back in 2008, the big unveiling event was OpenCL, and this year, one of the big unveiling events was oneAPI, in which SYCL plays a significant part.

In a way, we have come full circle, first making C available to the heterogeneous world in 2008 through OpenCL and now making modern C++ available in 2019. During that time, we have gained a great deal of experience with making modern C++ increasingly heterogeneous, through OpenMP, Kokkos, Raja, C++ AMP, HPX, and now SYCL. And now we are in the process of bringing that experience back to ISO C++ itself. Here is a slide from my keynote at LLVM and DOE, highlighting the “Big Bang” explosion of C++ heterogeneous computing with the various C++ frameworks.

Personally, I have been dreaming of something like SYCL and oneAPI since I was involved with OpenMP. Even back then, I had wished for one unified programming model that could be used to serve both the HPC and the embedded industries. SYCL helps satisfy that goal, and Intel’s unveiling of oneAPI showed an intention to use SYCL to support many types of devices, from CPU to GPU to FPGA to dedicated AI processors.

While high performance computing has traditionally used FORTRAN, in the past few years there has been rapidly growing interest in using C++ for HPC applications. In fact, this year at SC19 there were many sessions talking about C++, and more specifically SYCL, with a range of sessions presenting research projects and providing a forum for discussion and education. Compared to last year’s SC18, where there were four SYCL talks, there was a virtual explosion of SYCL events.

Major announcements tend to happen during several key conferences, and this unveiling is no different. SuperComputing is the largest gathering of high performance computing experts in the world. Khronos, through SYCL (together with the new Khronos ANARI Working Group that is developing an API standard for portable scientific visualization), had major representation at this year’s SC event. Let’s review some of the highlights.

Intel Developer Conference

It started with Intel’s Developer Conference, a two day conference on Sunday, November 17 and Monday, November 18. At this conference, Raja Koduri, senior vice president, chief architect, and general manager of Architecture, Graphics, and Software at Intel, presented a roadmap for oneAPI, an open software framework for programming CPUs, GPUs, FPGAs, and AI processors. At the heart of oneAPI is DPC++ (Data Parallel C++), which is Intel’s SYCL implementation using Clang. SYCL was presented as the programming model for all these different device types, framed as sequential, parallel, matrix, and spatial computing. Intel will make significant contributions to the SYCL ecosystem of software libraries, debugging, and analyzer tools.

There was a talk on DPC++ by James Brodman describing Intel’s SYCL implementation, which adds Intel vendor extensions and selective future C++ standard extensions. It is common for vendors to add key extensions to meet customer needs, which can then be potential candidates for future inclusion into the multi-vendor, open, SYCL standard.

One thing that seems to be lost in all the excitement is that this is not just ‘Intel’s’ oneAPI, but a standards-based interface that is open to any vendor to add layers to support their own devices.

There was an onstage panel consisting of Codeplay’s John Lawson, ANL’s Hal Finkel, Intel Fellow Geoff Lowney, and others that addressed how this important initiative will impact the HPC community.

Codeplay’s CEO, Andrew Richards, talked on the importance of ecosystem, and the room was overflowing. An important message was that an open ecosystem of languages and libraries can serve a larger community better than a language from any single vendor. We know this can be the case, but to be truly successful, it needs to be one that is based on open standards under collaborative, fair, multi-company governance—such as at Khronos.

Also launched around SC19 was a book on DPC++ and SYCL, which is free to download. The first four chapters have been completed by key SYCL WG members, as well as experts from Intel, including James R. Reinders, who came out of retirement to guide this work. The intention is to continue extending this book to educate users on how to make best use of the SYCL programming model.

H2RC Workshop

The Fifth International Workshop on Heterogeneous High-Performance Reconfigurable Computing (H2RC 2019) hosted two presentations solely-focused on SYCL. The keynote was presented by Ronan Keryell from Xilinx, a Khronos member, who outlined the benefits of using SYCL for FPGA programming in a talk entitled, "SYCL: A Single-Source C++ Standard for Heterogeneous Computing." Later in the morning, Michael Kinsner and John Freeman from Intel presented, "Data Flow Pipes: A SYCL Extension for Spatial Architectures," describing the ‘pipes’ extension that enables a more usable and flexible data interface.

IA^3 Workshop

Monday, November 18, began with a keynote talk from myself at the IA^3 2019 workshop. I described the effort towards adapting modern heterogeneous computing models in ISO C++ with features such as affinity, executors, and data locality. I talked about the “Four Horsemen of Heterogeneous Computing” from my LLVM keynote: data affinity, data locality, data placement, and data movement. I also described the evolution and future directions for SYCL. The room was full with about 100 people. There was a great deal of interest in one of the ISO C++ affinity papers that addresses the data affinity problem and is working its way through the ISO committee with the intention to collaborate based on work at PNNL on SHAD (the Scalable High-performance Algorithms and Data-structures Library), which is a similar library-based heterogeneous system.

Interviews

On Sunday, November 17, Codeplay CEO, Andrew Richards, was on stage at HPC Day with The Next Platform, conducting a live interview entitled, "Toward an Open Ecosystem for HPC and AI Compute.”

This was followed on Wednesday with an interview where I gave views of oneAPI and what its direction will bring to the SYCL standard. That video is now available on Vimeo and Youtube.

SYCL BoF

On Thursday, November 21, Khronos and the SYCL Working Group hosted a Birds of a Feather (BoF) session focused on the current and future plans for SYCL. This was chaired by Professor Simon Mcintosh-Smith from the University of Bristol and I.

There was good attendance, with panelists from hardware companies, software companies, and research institutes. Many HPC programmers had not heard of SYCL, but with the increasing importance of modern C++ in HPC, as well as a desire to seek alternatives to proprietary languages, SYCL is becoming critical as a vendor-neutral framework to write C++ code that embraces heterogeneous parallelism. SYCL is an open standard, and there are multiple vendor implementations available, including open source projects.

In this BoF, experts and implementers explained SYCL’s advantages, how the language is governed, where it is going, and why you should be aware of it if you are intending to write C++ code for HPC machines.

There were about 200 people in the audience, and, given the depth and detail of questions asked, it was obvious many people already knew a lot about SYCL.

In the past, this BoF was actually rolled into the following Heterogeneous C++ BoF, but with the increasing popularity of SYCL—especially with oneAPI—we decided to break SYCL into its own session.

Heterogeneous and Distributed Computing in ISO C++ for HPC BoF

This is the third iteration of this BoF at SC, and previous iterations have focused on all the heterogeneous C++ frameworks, including Kokkos, Raja, HPX, CUDA, and SYCL. After two successful BOFs at SC17 and SC18, there was popular demand for updates on the progress of adding these capabilities into ISO C++ including task dispatch with executors and the property mechanism, data layout, affinity, error handling, and asynchronous execution.

The BoF also discussed the finalization of C++20, and support for distributed and heterogeneous computing from active participants in the standardization process. We also looked ahead on what is possible for C++23.

There were about 200-300 people in the audience with many great questions asked, indicating a high level of interest in C++ directions.

Performance, Portability, and Productivity Workshop

Finally on the last day of SC19, Friday, November 22, there was a session comparing the implementation and performance of a Wilson Dslash Stencil Operator Mini-App using Kokkos and SYCL. Bálint Joó, from the Thomas Jefferson National Accelerator Facility, along with co-authors, presented their benchmark results and experience of developing this operator using both SYCL and Kokkos.

The View from the Chair’s Corner

Given the many events covering SYCL, oneAPI, and DPC++ at SC19, I would like to end this blog with some of my personal views (not necessarily that of Khronos or Codeplay) and impressions, though even I could be wrong in this very fast moving field. So, Caveat Emptor.

Probably the question I was asked most at SC19 and afterwards was related to oneAPI, so I will dive in and give my personal thoughts.

The spirit of oneAPI can be captured in three connected ideas:

  1. Providing full access and control over the capability of your compute devices...
  2. Across many many device kinds and...
  3. From a single high-level open standard language.

So what’s compelling about oneAPI? Intel’s oneAPI can be implemented on multiple platforms and is based on industry standards including SYCL and OpenMP, as well as a number of libraries.

DPC++ is Intel’s implementation of the SYCL open standard—whose openness flows from being under multi-company governance at Khronos. Any company can have a voice in SYCL’s evolution, and that flows into DPC++. What captures my imagination is that like SYCL, DPC++ is open, free, and enables anyone to build plugins to any Intel or non-Intel devices, embedded devices, FPGAs, and high performance AI devices.

What benefits will SYCL bring to Codeplay, my own company, which is working on connected autonomous vehicles (CAV) with machine learning and machine vision? For a long time, HPC and consumer computing have used different models, but lately, as AI is deployed in self-driving cars, they begin to have very similar workloads. We have been championing SYCL for some time, and it has been successful within the embedded device, autonomous vehicle, and graphics domains where we have worked with Renasas to support R-car, distributed by some Tier-1 OEMs.

Xilinx is building SYCL support for their FPGA hardware by layering over OpenMP and OpenCL using a SPIR/LLVM backend. Universities are free to explore SYCL, just as Heidelberg University has done, as the specification is free to use without being a Khronos member. In fact, there are multiple backends in development, on multiple low-level APIs in addition to OpenCL, including ROCm and CUDA. All this demonstrates the advantage of leveraging open standards that enable the flexible integration and deployment of multiple acceleration technologies.

The following figure (courtesy of the SYCL Working Group) shows available SYCL implementations:

  • DPC++ is Intel’s open source implementation using clang and supports CPUs, multiple CPUS, GPUs and FPGAs through OpenCL with SPIR-V, and NVIDIA GPUs through CUDA.
  • ComputeCPP is Codeplay’s commercial implementation also based on clang (though there is also a free download), which supports any CPU as well as a host of GPUs, FPGAs and specialized accelerators through OpenCL with SPIR or SPIR-V, as well as NVIDIA GPUs through PTX ingested through OpenCL.
  • Xilinx’s open source SYCL implementation is called TriSYCL, using an OpenMP backend for any CPU and OpenCL with SPIR /LLVM for Xilinx FPGAs.
  • Heidelberg University has an implementation called hipSYCL that uses OpenMP for any CPU, CUDA for NVIDIA GPUs, and ROCm for AMD GPUs.
SYCL Implementations Provide Access to Diverse Processors Through Standard C++

Given that developers and the industry have been chasing the promise of cross-architecture development for a long time, you may ask: What is different, and why now? Well, for the first time, SYCL provides a high-level, general purpose programming framework that is capable of dispatch/offload to many devices in one high-level modern C++ open standard language. We need this kind of type-safe generic programming that enables diverse workloads—from traditional vector loops to AI convolutions.

DPC++ is Intel’s SYCL implementation with Intel extensions and ISO C++ elements, which will, in-turn, feed back into the SYCL standard. SYCL remains an independent standard, taking input from many companies. SYCL uses modern C++ and, most importantly, a well-tested programming model for device offload that is based on single source, is type-safe, and enables generic programming. SYCL is rapidly tracking modern C++ 14, 17, 20, and 23 in future, to provide a type-safe, generic programming capability in the style of large-scale C++. SYCL aims to incorporate C++ future features, such as executors for easy portability for modern C++ programmers. I think Intel has done the industry a great service in helping to democratize portable, heterogeneous compute across CPUs, GPUs, FPGAs, and AI devices.

As chair of SYCL and as a C++ Directions group member, I and the SYCL Working Group intend to evolve SYCL and C++ in parallel by sharing learnings and experience, while embracing extension proposals from many companies, including Intel through DPC++, but also Xilinx, AMD, ARM, StreamHPC, Qualcomm, Codeplay, and the national research laboratories such as ANL who is increasingly active in the SYCL working group currently. I and Codeplay also actively attend the OpenCL working group and learn from the many companies that are members there, including NVIDIA, Imagination. Adobe, Google, AMD, ARM, Qualcomm, Intel, Samsung and too many others to name.

Thinking back to 2008 when OpenCL burst onto the scene, few would have foreseen the growth of SYCL from OpenCL; but today, the two act as complementary programming models serving the modern C and C++ communities, and both are growing from strength to strength. SYCL arose from being layered over OpenCL, and many SYCL implementations will continue to use OpenCL, as it remains the open standard way of reaching down into low level heterogeneous hardware. Both SYCL and OpenCL enable ecosystems that are free and open for any vendor to support on their platform.

How to Connect!

The SYCL and OpenCL Working Groups are planning to be engaged with the community throughout the year, and are aiming to have an educational presence at the following upcoming events, whether virtually or in-person. I hope to see you there!

Or, ask questions or provide feedback on SYCL via the SYCL and OpenCL forum.

Acknowledgements

No blog with this much information can be achieved without a lot of feedback and help. In fact, people would be surprised how much time a good blog takes. This particular one took about 2 months. I would like to acknowledge the following people who took time to read one of the drafts, though I apologize if I missed anyone, as it is not intentional. So, in no particular order:

Ronan Keryell, Rod Burns, Caster Communications, Neil Trevett, James Brodman, Mike Kinsner, Andrew Richards, Gordon Brown, Ruyman Reyes, the SYCL working group, and many others who have offered suggestions and improvements. Any errors that remain are mine alone.

Authored by

Comments

 

Khronos® and Vulkan® are registered trademarks, and ANARI™, WebGL™, glTF™, NNEF™, OpenVX™, SPIR™, SPIR-V™, SYCL™, OpenVG™ and 3D Commerce™ are trademarks of The Khronos Group Inc. OpenXR™ is a trademark owned by The Khronos Group Inc. and is registered as a trademark in China, the European Union, Japan and the United Kingdom. OpenCL™ is a trademark of Apple Inc. and OpenGL® is a registered trademark and the OpenGL ES™ and OpenGL SC™ logos are trademarks of Hewlett Packard Enterprise used under license by Khronos. All other product names, trademarks, and/or company names are used solely for identification and belong to their respective owners.
devilish