Skip to main content

CppCon 2021

CppCon 2021 Banner
October 24-29, 2021
Aurora Colorado & Online

This year, CppCon is a hybrid format, so we are presenting four tracks for onsite attendees and five tracks for online attendees.

Online attendees will be able to participate in onsite sessions via “simul-cast” for most sessions. A few onsite sessions will be recorded and rebroadcast for online attendees. Rebroadcasted sessions will feature the presenters live in the session chat room and, time allowing, live Q&A at end of the session. (Online attendees will have the ability to view recorded versions of all sessions–onsite and online–shortly after they happen.)

Register Today

Khronos Related Sessions

Heterogeneous Modern C++ with SYCL 2020

Date and Time: Monday, October 25 • 9:00am - 10:00am
Speakers: Gordon Brown, (Codeplay Software Ltd.), Tom Deakin (University of Bristol), Nevin Liber (Argonne Leadership Computing Facility), Michael Wong (Codeplay, SYCL Working Group Chair, The Khronos Group)
View Website

Heterogeneous programming in C++ used to be hard and low-level. Today, Khronos SYCL for Modern C ++ works with different libraries, ML frameworks and Standard C++ code on top of template libraries with lambda functions that separate host and accelerate device code in a single source, but still enables separate compilation of host and device code. The device SYCL compiler may employ kernel fusion for better performance and the host CPU compiler can be any C++ compiler, from clang, gcc, VS C++, or IBM XL compiler. SYCL enables accelerated code to pass into device OpenCL compilers or other backends to dispatch to any variety of devices.

C++20 on Xilinx FPGA with SYCL for Vitis

Date and Time:
Thursday, October 28 • 4:45pm - 5:45pm
Friday, October 29 • 12:00pm - 1:00pm
Speakers: Ronan Keryell (Xilinx Research Labs)
View Website

FPGA (Field-Programmable Gate Arrays) are electronic devices which are programmable with a configuration memory to implement arbitrary electronic circuits. While they have been used for decades to implement various adaptable electronic components, they got some traction more recently to be used as generic programmable accelerators more suitable for software programmers.

There are already HLS (High-Level Synthesis) tools to translate some functions written with languages like C/C++ into equivalent electronic circuits which can be called from programs running on processors to accelerate parts of a global application, often in an energy-efficient way. The current limitation is that there are 2 different programs: the host part, running the main application, and the device part, glued together with an interface library without any type-safety guaranty.

Since the C++ standard does not address yet the concept of hardware heterogeneity and remote memory, the Khronos Group organization has developed SYCL, an open standard defining an executable DSL (Domain-Specific Language) using pure modern C++ without any extension. There are around 10 different SYCL implementations targeting various devices allowing a single-source C++ application to run on CPU and controlling various accelerators (CPU, GPU, DSP, AI...) in a unified way by using different backends at the same time in a single type-safe C++ program.

We present a SYCL implementation targeting Xilinx Alveo FPGA cards by merging 2 different open-source implementations, Intel’s oneAPI DPC++ with some LLVM passes from triSYCL.

For a C++ audience, this presentation gives a concrete example on why the C++ standard does not describe detailed execution semantics (stack, cache, registers...): because C++ can be executed on devices which are not even processors.

While this presentation targets FPGA and a SYCL implementation from a specific vendor, the content provides also: - a generic introduction to FPGA which should be interesting outside of Xilinx or even without the use of SYCL; - how C++ can be translated in some equivalent electronic circuits; - a generic introduction to SYCL which should be interesting for people interested to know more about heterogeneous programming and C++, beyond only FPGA.

Beyond CUDA: GPU Accelerated Computing on Cross-Vendor Graphics Cards with Vulkan Compute (AMD, Qualcomm, NVIDIA & Friends)

Date and Time: Friday, October 29 • 10:30am - 11:30am
Speakers: Alejandro Saucedo (Chief Scientist, The Institute for Ethical AI & Machine Learning)
View Website

Many advanced data processing paradigms fit incredibly well to the parallel-architecture that GPU computing offers, and exciting advancements in the open source projects such as Vulkan and Kompute are enabling developers to take advantage of general purpose GPU computing capabilities in cross-vendor mobile and desktop GPUs (including AMD, Qualcomm, NVIDIA & friends). In this talk we will provide a conceptual and practical insight into the cross-vendor GPU compute ecosystem as well as how to adopt these tools to add GPU acceleration to your existing C++ applications.

In this talk we will show how you can write a simple GPU accelerated machine learning algorithm from scratch which will be able to run on virtually any GPU. We will give an overview on the projects that are making it possible to accelerate applications across cross-vendor GPUs. We'll show how you can get started with the full power of your GPU using the Kompute framework with only a handful of lines of C++ code, as well as providing an intuition around how optimizations can be introduced through the lower level C++ interface.

As part of the more advanced example, we will showcase some optimizatiosn that can be leveraged through the hardware capabilities of relevant graphics cards, such as concurrency-enabled GPU queues which allow us to introduce 2x+ performance improvements into advanced data processing workloads. We will dive into the GPU computing terminology around asynchronous & parallel workflow processing, cover the core principles of data parallelism, explain the hardware concepts of GPU queues & queueFamilies, and talk about how advancements in new and upcoming graphics cards will enable for even bigger speedups (such as the AMD architectures or NVIDIA Ampere GA10x architecture which will support up to 3 parallel queue processing workloads).


Gordon Brown
Gordon Brown, Codeplay
Tom Deakin
Tom Deakin, University of Bristol
Ronan Keryell
Ronan Keryell, Xilinx Research Labs
Nevin Liber
Nevin Liber, Argonne Leadership Computing Facility
Alejandro Saucedo
Alejandro Saucedo, The Institute for Ethical AI & Machine Learning
Michael Wong
Michael Wong, Codeplay

Khronos videos, presentations, and upcoming events. Skip to the Footer