Skip to main content

Khronos Blog

Ray Tracing In Vulkan

Updated: December 15, 2020
This blog has been updated to include the latest techniques available (Originally posted March 17, 2020).

Vulkan SDK shipping today with ray tracing support including Validation Layers.


The Khronos Vulkan Ray Tracing Task Sub Group (TSG) has developed and released a set of extensions that seamlessly integrate ray tracing functionality into the existing Vulkan framework. This blog summarizes how the Vulkan Ray Tracing extensions were developed, and illustrates how they can be used by developers to bring ray tracing functionality to their applications.

The Ray Tracing TSG was formed in early 2018 and tasked to bring a tightly integrated, cross-vendor, ray tracing solution to Vulkan. In March 2020, the provisional ray tracing extensions were released to gather public review and input into the final design of these extensions. On November 23, 2020, the final specifications were released and on December 15, 2020 the Vulkan SDK was announced, which incorporated support for these extensions, enabling developers to be able to easily integrate Vulkan Ray Tracing into their applications.

The TSG received a number of design contributions from IHVs and examined requirements from both ISVs and IHVs. Real-time techniques for ray tracing are still being actively researched and so the first version of Vulkan Ray Tracing has been designed to provide an effective framework, while setting an extensible stage for future developments.

One overarching design goal was to provide a single, coherent cross-platform and multi-vendor framework for ray tracing acceleration that could be easily used together with existing Vulkan API functionality. We enabled selected parts of the framework to be optional for deployment flexibility in keeping with the Vulkan philosophy. For this first version, we are primarily aiming to expose the full functionality of modern desktop hardware.

ISVs were also very clear—we needed to enable content using contemporary proprietary APIs, such as NVIDIA OptiX™ or Microsoft DirectX Raytracing, to be easily portable to Vulkan. Consequently we used a familiar overall architecture, including the re-use of HLSL shaders, while also introducing new functionality and implementation flexibility.

One critical use case for Vulkan Ray Tracing is real-time ray tracing in games - typically using a hybrid combination of a rasterized scene with some ray traced aspects. Some examples include rasterization post-processing after tracing primary rays, using ray tracing for shadow map generation, and dynamic light baking asynchronously with other system tasks. See Figures 1 and 2 for an example of hybrid rendering using Vulkan in Wolfenstein: Youngblood.

Vulkan Ray Tracing can also be used for accelerating offline production rendering and creative tools, for example offline light-map baking. There are many additional innovative techniques that can leverage an accelerated ray tracing framework, including non-rendering techniques. We look forward to seeing how you put these extensions to use!

Figure 1: Wolfenstein: Youngblood - Ray Tracing OFF
Figure 2: Wolfenstein: Youngblood - Ray Tracing ON.
Note the improved reflections on the metal walls, on the floor through the doorway, and on the windows inside the room. It also improves the overall lighting in the scene and eliminates the lighting artifacts on the sides and floor of the doorway.

Introduction to the Vulkan Ray Tracing Extensions

The Vulkan Ray Tracing functionality consists of a number of Vulkan, SPIR-V, and GLSL extensions.

The three primary Vulkan extensions are:

  • VK_KHR_acceleration_structure - which provides functionality for acceleration structure building and management
  • VK_KHR_ray_tracing_pipeline - providing ray tracing shader stages and pipelines, and
  • VK_KHR_ray_query - providing ray query intrinsics for all shader stages.

Some of this functionality is optional, so be sure to check for the supported features and properties on your driver!

The VK_KHR_acceleration_structure extension provides a common base for acceleration functionality that is used by both the VK_KHR_ray_tracing_pipeline and VK_KHR_ray_query extensions. The features of this extension are described by the VkPhysicalDeviceAccelerationStructureFeaturesKHR structure:

  • accelerationStructure indicates whether the implementation supports the acceleration structure functionality. This is required by all implementations supporting this extension.
  • accelerationStructureCaptureReplay indicates whether the implementation supports saving and reusing acceleration structure device addresses for trace capture and replay. This optional functionality is intended to be used by tools and not by applications directly.
  • accelerationStructureIndirectBuild indicates whether the implementation supports indirect acceleration structure build commands (vkCmdBuildAccelerationStructuresIndirectKHR).
  • accelerationStructureHostCommands indicates whether the implementation supports host side acceleration structure commands (vkBuildAccelerationStructuresKHR, vkCopyAccelerationStructureKHR, vkCopyAccelerationStructureToMemoryKHR, vkCopyMemoryToAccelerationStructureKHR, and vkWriteAccelerationStructuresPropertiesKHR). These CPU-based commands are optional, but the device (GPU) versions of these commands (vkCmd*) are always supported.
  • descriptorBindingAccelerationStructureUpdateAfterBind indicates whether the implementation supports updating acceleration structure descriptors after a set is bound. If this required feature is not enabled, the VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT must not be used with VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR.

It should also be noted that this extension also requires the descriptorIndexing and bufferDeviceAddress features from Vulkan 1.2 (or precursor extensions) to be supported. Vulkan 1.1 and VK_KHR_spirv_1_4 are also required.

The features provided by the VK_KHR_ray_tracing_pipeline extension are described by the VkPhysicalDeviceRayTracingPipelineFeaturesKHR structure:

  • rayTracingPipeline indicates whether the implementation supports the ray tracing pipeline functionality, including the ray tracing shader stages and pipelines, and the SPV_KHR_ray_tracing SPIR-V extension. This is required by all implementations supporting this extension.
  • rayTracingPipelineShaderGroupHandleCaptureReplay indicates whether the implementation supports saving and reusing shader group handles for trace capture and replay. This optional functionality is intended to be used by tools and not by applications directly.
  • rayTracingPipelineShaderGroupHandleCaptureReplayMixed indicates whether the implementation supports reuse of shader group handles being arbitrarily mixed with creation of non-reused shader group handles. If this is VK_FALSE, all reused shader group handles must be specified before any non-reused handles may be created.
  • rayTracingPipelineTraceRaysIndirect indicates whether the implementation supports indirect trace ray commands, e.g. vkCmdTraceRaysIndirectKHR. This is required to be supported.
  • rayTraversalPrimitiveCulling indicates whether the implementation supports the SkipTrianglesKHR and SkipAABBsKHR ray flags for primitive culling during ray traversal. This is required to be supported if the VK_KHR_ray_query is also supported. In GLSL, these are enabled by specifying the primitive_culling layout using the GLSL_EXT_ray_flags_primitive_culling extension.

There is only one feature provided by the VK_KHR_ray_query extension which is described by the VkPhysicalDeviceRayQueryFeaturesKHR structure:

  • rayQuery indicates that the implementation supports the ray query functionality provided by the SPV_KHR_ray_query SPIR-V extension, in all shader stages (including ray tracing pipelines, if supported). This is required by all implementations supporting this extension.

There are also a number of queryable properties added by these extensions. They are described by the VkPhysicalDeviceAccelerationStructurePropertiesKHR and VkPhysicalDeviceRayTracingPipelinePropertiesKHR structures. Some notable items here are that the size of the shader header (shaderGroupHandleSize) is required to be exactly 32, and the maximum recursion depth (maxRayRecursionDepth) is only required to be 1.

There are two additional extensions which have been added as building blocks for additional functionality. These extensions add infrastructure but do not enable the functionality on their own. Future extensions may build upon these extensions in other areas of the API, but at this point they enable specific uses of this functionality for ray tracing only. VK_KHR_deferred_host_operations allows expensive driver operations to be offloaded to application-managed CPU thread pools which can enable work to be done on background threads or parallelized across multiple cores. With ray tracing this can be used for ray tracing pipeline compilation or CPU-based acceleration structure construction. VK_KHR_pipeline_library provides the ability to provide a set of shaders which can be linked into pipelines. With ray tracing it can be useful when incrementally constructing ray tracing pipelines.

Shaders for use with the ray tracing extensions are supplied to the API as SPIR-V binaries which use two new SPIR-V extensions:

Developers can generate these binaries using either GLSL or HLSL. For GLSL, there are three new GLSL extensions: GLSL_EXT_ray_tracing, GLSL_EXT_ray_query, and GLSL_EXT_ray_flags_primitive_culling, which are supported in the open source glslang compiler. HLSL support is also provided via DXC, Microsoft's open source HLSL compiler, allowing Vulkan ray tracing shaders to be authored in HLSL using the syntax defined by Microsoft, with minimal modifications. Both glslang and DXC are included as part of the Vulkan SDK .

The following sections of this document go into more detail on the ray tracing functionality including creating and using acceleration structures, host and deferred operations, ray traversal, ray tracing pipelines, and ray queries.

Acceleration Structures

To achieve high performance on complex scenes, ray tracing performs ray intersections against an optimized data structure built over the scene information called an acceleration structure (AS). The acceleration structure is divided into a two-level hierarchy as shown in Figure 3. The lower level, the bottom-level acceleration structure, contains the triangles or axis-aligned bounding boxes (AABBs) of custom geometry that make up the scene. Since each bottom level acceleration structure may correspond to what would be multiple draw calls in the rasterization pipeline, each bottom level build can take multiple sets of geometry of a given type. The upper level, the top-level acceleration structure, contains references to a set of bottom-level acceleration structures, each reference including shading and transform information for that reference.

Figure 3: Acceleration Structure Hierarchy

Building either type of acceleration structure results in an opaque, implementation-defined format in memory. The bottom-level acceleration structure is only used by reference from the top-level acceleration structure. The top-level acceleration structure is accessed from the shader as a descriptor binding or by device address (obtained via vkGetAccelerationStructureDeviceAddressKHR).

To create an acceleration structure, the application must first determine the sizes required for the acceleration structure. For builds, the size of the acceleration structure and the scratch buffer sizes for builds and updates are obtained in the VkAccelerationStructureBuildSizesInfoKHR structure via the vkGetAccelerationStructureBuildSizesKHR command. The shape and type of the acceleration structure to be created is described in VkAccelerationStructureBuildGeometryInfoKHR structure. This is the same structure that will later be used for the actual build, but the acceleration structure parameters and geometry data pointers do not need to be fully populated at this point (although they can be), just the acceleration structure type, and the geometry types, counts, and maximum sizes. These sizes are valid for any sufficiently similar acceleration structure. For acceleration structures that are going to be the target of a compacting copy, the required size can be obtained via the vkCmdWriteAccelerationStructuresPropertiesKHR command. Once the required sizes have been determined, the application creates a VkBuffer for the acceleration structure (accelerationStructureSize), and VkBuffer(s) as needed for the build (buildScratchSize) and update (updateScratchSize) scratch buffers.

Next, the VkAccelerationStructureKHR object can be created using the vkCreateAccelerationStructureKHR command which creates an acceleration structure of the specified type and size and places it at offset within the buffer provided in VkAccelerationStructureCreateInfoKHR. Unlike most other resources in Vulkan, the specified portion of the buffer fully provides the memory for the acceleration structure; no additional memory requirements need to be queried or memory bound to the acceleration structure object. If desired, multiple acceleration structures can be placed in the same VkBuffer, provided the acceleration structures do not overlap.

Builds are performed with vk{Cmd}BuildAccelerationStructuresKHR. For a bottom-level acceleration structure, the vertex data for triangles or the extent information for the AABBs is pulled from a buffer. A top-level acceleration structure pulls the shading, transform, and reference information for each instance from a structure in a buffer. An update to an acceleration structure is performed using the same functions with a special flag to indicate that an update of the positions from the existing acceleration structure is required.

Because the acceleration structure memory is in an implementation-defined, opaque format, there is a set of functions to perform operations on the acceleration structure data: vk{Cmd}CopyAccelerationStructureKHR, vk{Cmd}CopyAccelerationStructureToMemoryKHR, and vk{Cmd}CopyMemoryToAccelerationStructureKHR.

In addition to a basic copy, these functions can perform a restricted form of serialization and deserialization to save and restore accelerations structures with specific version compatibility requirements (see vkGetDeviceAccelerationStructureCompatibilityKHR).

Acceleration structures may also end up with more space reserved for them than is required, so for large static acceleration structures, it can be beneficial to reduce the final amount of space used. An application can use vk{Cmd}WriteAccelerationStructuresPropertiesKHR to query the final compacted size then use vk{Cmd}CopyAccelerationStructureKHR to compact the acceleration structure.

Host Operations

Acceleration structures are very large resources, and managing them requires significant processing effort. Scheduling this work on a device alongside other rendering work can be tricky, particularly when host intervention is required. Vulkan provides both host and device variants of acceleration structure operations, allowing applications to better schedule these workloads. The device variants (vkCmd*AccelerationStructure*KHR) are enqueued into command buffers and executed on the device timeline, and the host variants (vk*AccelerationStructure*KHR) are executed directly on the host timeline.

Deferred Operations

Performing acceleration structure builds and updates on the CPU is a workload that is relatively easy to parallelize, and we wanted to be able to take advantage of that in Vulkan. An application can execute independent commands on independent threads, but this approach requires that there be enough commands available to fully utilize the machine. It can also lead to imbalanced loads, since some commands might take significantly longer than others.

In order to avoid these snags, we added deferred operations to enable intra-command parallelism: spreading work for a single command across multiple CPU cores. A driver-managed thread pool is one way to achieve this, but is not in keeping with the low-level explicit philosophy of Vulkan. Applications also run their own thread pools, and it is preferable to enable these threads to perform the work, so that the application can manage the execution of driver work together with the rest of its load.

Deferred host operations are designed around a “division of labor” principle. The application is responsible for:

  • Setting up commands and requesting deferred execution.
  • Assigning worker threads to execute deferred commands.
  • Setting priorities and CPU budgets as it sees fit, by choosing which tasks to execute, and when to execute them.

The driver is responsible for:

  • Tracking the execution state of a deferred command.
  • Implementing distributed execution, whatever parallel constructs are most appropriate for the workload (tasks, parallel loops, dependency graphs, work queues, and the like).

In this way, the application controls the allocation and prioritization of work, but the driver manages the low-level details.

To use deferred operations, the application first constructs a VkDeferredOperationKHR object, which encapsulates the execution state of a deferred command. This object will be in one of two states (Complete or Pending) throughout its life cycle, as shown in Figure 4.

Figure 4: Deferred Operation State Diagram

A deferred operation is constructed in the Complete state. The application issues a deferral request for a command by passing a VkDeferredOperationKHR object to the command. If the driver honors the deferral request, the deferred operation transitions to the Pending state. Note that drivers are free to deny the request and simply execute the command in place, causing it to immediately become complete.

Once deferred, an operation will not progress until the application joins a thread to it by calling vkDeferredOperationJoinKHR. The join command instructs the driver to use the calling thread to process the command associated with a given deferred operation. An application may join any number of threads to a deferred operation, and doing so will generally cause the command to complete more quickly. The operation becomes Complete whenever at least one joined thread has observed a VK_SUCCESS return value from vkDeferredOperationJoinKHR. Note that if multiple threads have joined the deferred operation, the implementation may return early from the join if it knows that it has more threads joined than it is able to utilize.

Use Case: Simplified Compaction

Compaction is a very important optimization for reducing the memory footprint of ray tracing acceleration structures. Acceleration structure construction with compaction looks like this:

  1. Determine the worst-case memory requirement for an acceleration structure
  2. Allocate device memory
  3. Build the acceleration structure
  4. Determine the compacted size
  5. Synchronize with the GPU
  6. Allocate device memory
  7. Perform a compacting copy

In order to allocate memory for a compacted acceleration structure, an application needs to know its size. To determine the size, it needs to submit a command buffer for steps 3 and 4, and wait for it to finish.

This detail causes alarm bells to ring in the minds of experienced engine developers. If done naively, this sort of host/device handshaking can seriously degrade performance. If done well, it is a significant source of complexity, and can cause spikes in an application’s device memory footprint, because the uncompacted acceleration structures need to live in device memory for at least one frame.

Host builds allow us to remove both of these drawbacks. Using host builds, we can implement compaction by performing the initial build on the host, and then performing a compacting copy from host memory to device memory. This copy still requires monitoring so that the app can recover the host memory, but this is a more familiar pattern, one which engines already implement for uploading texture and geometry data to the device.

Use Case: Load Balancing

Host acceleration structure builds provide opportunities to improve performance by leveraging otherwise idle CPUs. Consider a hypothetical profile from a game:

Figure 5: Load balancing: No Host Build

In Figure 5, acceleration structure construction and updates are implemented on the device, but the application has considerable CPU time to spare. Moving these operations to the host allows the CPU to execute the next frame’s acceleration structure work in parallel with the previous frame’s rendering. This can improve throughput, even if the CPU requires more wall-clock time to perform the same task, as shown in Figure 6.

Figure 6: Load Balancing: Host Build Enabled

Ray Traversal

Tracing a ray against an acceleration structure in Vulkan goes through a number of logical phases, giving some flexibility on how rays are traced. Intersection candidates are initially found based purely on their geometric properties - is there an intersection along the ray with the geometric object described in the acceleration structure?

Intersection testing is watertight in Vulkan - meaning for a single geometric object described in an acceleration structure, rays cannot leak through gaps between triangles, and multiple hits can not be reported for different triangles at the same position. This is not guaranteed for neighboring objects that happen to abut, but it means individual models will not have holes in them, or be shaded excessively.

Once a candidate is found, a series of culling operations occur before the intersection is confirmed. These culling operations discard candidates based on flags used for traversal, and properties of the acceleration structure; details of these are in the specification. Remaining opaque triangle candidates are confirmed as valid intersections; whereas AABBs and non-opaque triangles require shader code to programmatically determine whether a hit occurred.

Traversal proceeds until all possible candidates are found and either confirmed or discarded, and a closest hit is determined. Traversal can also be made to end early to avoid unnecessary processing. This can be useful for detecting occlusion, or as an optimization in certain cases.

Tracing rays and getting traversal results can be done via one of two mechanisms in Vulkan; Ray Tracing Pipelines and Ray Queries (see Figure 7):

  • Ray queries provide direct access to ray traversal logic in any shader stage, allowing them to be plugged into existing shaders and enhancing the effects those shaders express.
  • Ray tracing pipelines provide a dedicated ray tracing mechanism with dynamic shader selection, enabling significant flexibility in the materials used in a scene and programmable intersection logic.
Figure 7: Ray Tracing Architecture in Vulkan

Ray Tracing Pipelines

Applications can associate specific shaders with objects in a scene, defining things like material parameters and intersection logic for those objects. As traversal progresses, when a ray intersects an object, associated shaders are automatically executed by the implementation (see Figure 8). A ray tracing pipeline is similar to a graphics pipeline in Vulkan, but with added functionality to manage having significantly more shaders and to put references to specific shaders into memory.

Ray tracing pipeline work is launched using vkCmdTraceRaysKHR with a currently bound ray tracing pipeline. This command invokes an application-defined set of ray generation threads, which can call traceRaysEXT() from the shader, starting traversal work on the specified acceleration structure. During traversal, if required by the trace and acceleration structure, application shader code in an intersection and any hit shaders can control how traversal proceeds. After traversal completes, either a miss or closest hit shader is invoked.

Callable shaders may be invoked using the same shader selection mechanism, but outside of the direct traversal context.

The different shader stages can communicate parameters and results using ray payload structures between all traversal stages and ray attribute structures from the traversal control shaders.

Figure 8: Ray Tracing Pipeline flow diagram.

To enable the traversal phase to know which shader to invoke after a given step of traversal to control or respond to the traversal, the implementation uses a shader binding table. Each shader entry consists of a shader group handle queried from the implementation for a given shader group plus an optional shader buffer record which the application may use for instance-specific data such as buffer device addresses or descriptor indices. The address for any given shader is computed by the traversal through a combination of parameters to the trace rays API call, parameters to the traceRayEXT shader call, and information stored in the acceleration structure.

Ray tracing pipelines can be created directly as with other pipeline types, but because ray tracing pipelines can have orders of magnitude more shaders than other pipelines types and we may want to add shaders, the extension adds another mechanism: pipeline libraries. A pipeline library is a pipeline including state and shaders with an additional flag to indicate that it is not intended to be bound directly to the API but is intended to be used as a library of code to be included in a later pipeline. Pipeline libraries can be used in multiple ray tracing pipelines, allowing reuse of shader compilation in multiple pipelines. A ray tracing pipeline creation may include a set pipeline library pipelines in the creation as well as a set of ray tracing shaders. All of the compile state from each shader must match to create a compatible final pipeline. In addition to pipeline libraries, deferred host operations can be used in ray pipeline construction to enable further parallelization.

Note that while pipeline libraries are exposed as a separate extension, they are only currently integrated for use with ray tracing pipelines.

Example Ray Pipeline Shaders (GLSL)

// Ray generation shader
#version 460 core
#extension GL_EXT_ray_tracing : enable
layout(location = 0) rayPayloadEXT vec4 payload;
layout(binding = 0, set = 0) uniform accelerationStructureEXT acc;
layout(binding = 1, rgba32f) uniform image2D img;
layout(binding = 1, set = 0) uniform rayParams
    vec3 rayOrigin;
    vec3 rayDir;
    uint sbtOffset;
    uint sbtStride;
    uint missIndex;
void main() {
    traceRayEXT(acc, gl_RayFlagsOpaqueEXT, 0xff, sbtOffset,
                sbtStride, missIndex, rayOrigin, 0.0,
                computeDir(rayDir, gl_LaunchIDEXT, gl_LaunchSizeEXT),
                100.0f, 0 /* payload */);
    imgColor = payload + vec4(blendColor) ;
    imageStore(img, ivec2(gl_LaunchIDEXT), payload);

// Closest hit shader
#version 460 core
#extension GL_EXT_ray_tracing : enable
layout(location = 0) rayPayloadInEXT vec4 payload;

void main() {
    payload = vec4(0.0, 1.0, 0.0, 1.0);

// Miss shader
#version 460 core
#extension GL_EXT_ray_tracing : enable
layout(location = 0) rayPayloadInEXT vec4 payload;

void main() {
    payload = vec4(0.0, 0.0, 0.0, 0.0);

Ray Queries

Ray queries can be used to perform ray traversal and get a result back in any shader stage. Other than requiring acceleration structures, ray queries are performed using only a set of new shader instructions.

Ray queries are initialized with an acceleration structure to query against, ray flags determining properties of the traversal, a cull mask, and a geometric description of the ray being traced.

Properties of potential and committed intersections, and of the ray query itself, are accessible to the shader during traversal, enabling complex decision making based on what geometry is being intersected, how it is being intersected, and where (see Figure 9).

Figure 9: Ray Query flow diagram

Ray Query Example (GLSL)

The following is an incomplete example of ray queries in GLSL, illustrating how a shader could use ray queries to detect whether a given position is in shadow or not. This could be added to a fragment shader to feed into lighting calculations. The overall structure for most ray queries will usually be similar - initialize, proceed in a loop, then make a final determination.

rayQueryEXT rayQuery;
rayQueryInitializeEXT(rayQuery, accelerationStructure,
                      cullMask, origin, tMin, direction, tMax);

while(rayQueryProceedEXT(rayQuery)) {
    if (rayQueryGetIntersectionTypeEXT(rayQuery, false) ==
        ... // Determine if an opaque triangle hit occurred
        if (opaqueHit) rayQueryConfirmIntersectionEXT(rayQuery);
    else if (rayQueryGetIntersectionTypeEXT(rayQuery, false) ==
        ... // Determine if an opaque hit occurred in an AABB
        if (opaqueHit) rayQueryGenerateIntersectionEXT(rayQuery, ...);    

if (rayQueryGetIntersectionTypeEXT(rayQuery, true) ==
    // Not shadow!
} else {
    // Shadow!

Call to Action!

The Vulkan Software Development Kit (SDK) version and later now fully support the new Vulkan Ray Tracing extensions, including Validation Layers and integration of upgraded GLSL, HLSL and SPIR-V shader tool chains. The Khronos open source Vulkan Samples and Vulkan Guide have been upgraded to illustrate ray tracing techniques.

Production Vulkan drivers that include the Vulkan Ray Tracing extensions are now shipping for both AMD and NVIDIA GPUs, starting with the AMD Radeon Adrenalin 20.11.3 and NVIDIA R460 drivers for both GeForce and Quadro on Windows and Linux. The Vulkan Ray Tracing extensions will also be supported by Intel Xe-HPG GPUs, available in 2021, with driver support provided via regular driver updates.

Additional related materials on Vulkan Ray Tracing that you may find of interest include:

The Vulkan Working Group is excited to enable the developer and content creation communities to use Vulkan Ray Tracing and we welcome any feedback or questions. These can be shared through the Khronos Developer Slack and Vulkan GitHub Issues Tracker.

Welcome to the era of portable, cross-vendor, cross-platform ray tracing acceleration!