Skip to main content
Presentations and other assets from this event are presented here. For information on upcoming events, click here.
The SC Conference extends a warm welcome to our first-time attendees! SC started in 1988, and since then we’ve grown to host over 13,000 individuals at our events.
SC is one of the largest HPC conferences in the world. We host thousands of students, exhibitors, and presenters representing institutions of every shape and size. Our volunteers are committed to bring you a broad range of experiences to help you get the most out of the event. In fact, here below is what a few of our leaders have to say about their own first time at SC.
Time: Sunday, 14 November 2021 | 9:02am - 10:00am CST
Time: Sunday, 14 November 2021 | 11:00am - 11:30am CST
SYCL is an open-standard, parallel programming model for programming heterogeneous devices from Khronos. It allows single-source programming of diverse attached devices in a cross-platform manner in modern C++. SYCL provides different layers of parallel abstractions, including Same Instruction Multiple Thread (SIMT) kernels, data-parallel loop concurrency and hierarchical parallelism. We discuss Scoped Parallelism as an extension to the existing Hierarchical Parallelism in SYCL, and highlight the advantages and disadvantages of these models from the perspective of the programmer and an implementer of SYCL. In this paper, we compare writing benchmark programs using SIMT kernel, hierarchical parallelism and scoped parallelism paradigms, and present results running on a high-performance CPU and GPU.
Time: Sunday, 14 November 2021 | 11:30am - 12:00pm CST
To HPC and AI analytics engineers, math primitives like basic linear algebra subprograms or random number generators are key functionalities that have highly optimized implementations for different CPUs, GPUs, and other accelerators. However, developers must deal with different programming models and interfaces provided by various hardware vendors due to a lack of industry-standard interfaces for math primitives.We introduce the SYCL-based open-source interfaces for math primitives, oneMKL open-source interface project as a viable approach for bridging the cross-platform performance portability gap for math primitives across various HPC architectures. Exploiting the SYCL interoperability feature, this project enables the integration of optimized vendor-dependent libraries to maximize code reusability without compromising the performance. The cross-platform performance portability of the project was carried out on two major HPC hardware, including Intel CPU and NVIDIA GPU and an integrated Intel GPU. Our results state competitive performance with native optimized vendor-dependent libraries.
Time: Sunday, 14 November 2021 | 11:50am - 12:30pm CST
In this paper, we introduce a DPC++ matrix ex-tension to unify different tensor hardware: Intel® Advanced Matrix Extensions (Intel® AMX) to CPUs, NVIDIA® TPUs, IBM® POWER® MMA, etc. These tensor hardware units are usually accessed by low-level intrinsics or assembly to perform matrix operations. It is hard for scientists to program these domain-specific devices without the kind of high-level abstractions and efficient implementations we introduce here. We also extend the existing LLVM matrix intrinsics to represent this DPC++ extension and yield efficient Intel AMX code generation. Based on our case study of implementing this interface on Intel AMX hardware, we discuss some of the limitations of existing LLVM Intermediate Representation (IR) and how they can be overcome to exploit tensor hardware.
Time: Sunday, 14 November 2021 | 2:11pm - 2:18pm CST
With the emergence of new hardware architectures in HPC systems, we observe raised demand in software solutions allowing developers efficiently use system’s resources. SYCL programming model  for heterogenous systems provides specialization constants - runtime variables, which are invariant under the code execution on the target device. This property enables just-in-time (JIT) compiler optimizations for heterogenous targets like using optimal tile size in a tiled matrix multiplication kernel depending on the hardware used for execution.This paper describes the challenges and solution to implement SYCL specialization constants in oneAPI Data Parallel C++ (DPC++) compiler . We demonstrate how open specifications and open-source tools like SPIR-V™ specification  and SPIRV-LLVM translator  from Khronos® group are leveraged in our implementation.We provide performance data measured for a generic convolution expression implementation example and the version using specialization constants for filter coefficients to highlight the performance benefits of JIT compilation.
Time: Sunday, 14 November 2021 | 2:30pm - 3:00pm CST
In this paper, we introduce a GPU-friendly parallel implementation of Milc-Dslash that exposes multiple hierarchies of parallelism in the algorithm. Milc-Dslash was designed to serve as a benchmark with highly optimized matrix-vector multiplications to measure the resource utilization on the GPU systems. The parallel hierarchies in the Milc-Dslash algorithm are mapped onto a target hardware using Kokkos and SYCL programming models. We present the performance achieved by Kokkos and SYCL implementations of Milc-Dslash on NVIDIA A100 GPU, AMD MI100 GPU, and Intel Gen9 GPU. Additionally, we compare the Kokkos and SYCL performances with those obtained from the versions written in CUDA and HIP programming models on NVIDIA A100 GPU and AMD MI100 GPU, respectively.
Time: Sunday, 14 November 2021 | 3:30pm - 4:00pm CST
High-performance computing (HPC) is a major driver accelerating scientific research and discovery, from quantum simulations to medical therapeutics. The growing number of new HPC systems coming online are being furnished with various hardware components, engineered by competing industry entities, each having their own architectures and platforms to be supported. While the increasing availability of these resources is in many cases pivotal to successful science, even the largest collaborations lack the computational expertise required for maximal exploitation of current hardware capabilities. The need to maintain multiple platform-specific codebases further complicates matters, potentially adding a constraint on the number of machines that can be utilized. Fortunately, numerous programming models are under development that aim to facilitate software solutions for heterogeneous computing. In particular is SYCL, an open standard, C++-based single-source programming paradigm. Among SYCL's features is interoperability, a mechanism through which applications and third-party libraries coordinate sharing data and execute collaboratively. In this paper, we leverage the SYCL programming model to demonstrate cross-platform performance portability across heterogeneous resources. We detail our NVIDIA and AMD random number generator extensions to the oneMKL open-source interfaces library. Performance portability is measured relative to platform-specific baseline applications executed on four major hardware platforms using two different compilers supporting SYCL. The utility of our extensions are exemplified in a real-world setting via a high-energy physics simulation application. We show the performance of implementations that capitalize on SYCL interoperability are at par with native implementations, attesting to the cross-platform performance portability of a SYCL-based approach to scientific codes.
Time: Monday, 15 November 2021 | 4:15pm - 4:30pm CST
Field programmable gate arrays (FPGAs) are increasingly targeted by high level programming languages including C++, OpenCL, SYCL, and DPC++. Device-side language constructs are often designed first to target graphics processing units (GPUs) due to their proliferation, so there are design gaps to fill when enhancing languages to target reconfigurable architectures. One key gap in the SYCL specification is the ability to declare memory shared between kernels or functions on a single device, which can be implemented using efficient on-chip reconfigurable memory resources of an FPGA.This talk will describe a new language extension for DPC++ that will subsequently be proposed for the next SYCL specification, and which enables device-scope memory allocations that are accessed like they are global variables. This feature is important for spatial architectures in two ways: (1) it enables construction and additional optimization using on-chip memory resources; and (2) allows semantics to be defined around reprogramming and initialization of data in an efficient way for reconfigurable architectures.The talk will detail the semantics and optimization opportunities enabled by the extension, aspects of lifetime and initialization, controls enabling optimization on FPGA (both hints and semantic modifiers), and divergence from the existing OpenCL/SPIR-V program/module scope variable features. These benefits combine to close a recurring gap in the device language across architectures. The device_global feature enables both performance and usability in common coding patterns, and is the result of significant work that aims to inform the next version of the SYCL specification.
Time: Tuesday, 16 November 2021 | 5:15pm - 6:45pm CST
SYCL is an open standard with a new release in February 2021. After SC17, SC18, SC19, and SC20's successful ISO C++ SYCL BoF, and with increasing use of C++ in HPC, there was popular demand for updates on the new SYCL 2020 features. It means developers there will be able to write their HPC software using the SYCL standard and that will enable the same software on the forthcoming all-Intel Aurora supercomputer at Argonne National Lab, NERSC, LBL, ORNL, and potentially, supercomputers with other architectures, including AMD, ARM, or RISC-V.
Khronos videos, presentations, and upcoming events. Skip to the Footer