Skip to main content

Khronos Blog

Ray Tracing In Vulkan

Updated: November 23, 2020
Please find the most recent version of this blog here.
Updated: November 23, 2020
This blog refers to the provisional versions of the Vulkan Ray Tracing extensions. On November 23rd, 2020 Khronos released the final Vulkan Ray Tracing extensions. The techniques in this blog can be used with the final extensions, that all developers are now encouraged to use.

Vulkan Ray Tracing Final Specification Release


Today the Khronos Vulkan Ray Tracing Task Sub Group (TSG) is announcing the public release of the provisional Vulkan Ray Tracing extensions. The Ray Tracing TSG was formed in early 2018 and tasked to bring a tightly integrated, cross-vendor, ray tracing solution to Vulkan, this release marks the culmination of the first phase of the TSG’s mandate.

The TSG received a number of design contributions from IHVs and examined requirements from both ISVs and IHVs. Real-time techniques for ray tracing are still being actively researched and so the first version of Vulkan Ray Tracing has been designed to provide an effective framework, while setting an extensible stage for future developments.

One overarching design goal was to provide a single, coherent cross-platform and multi-vendor framework for ray tracing acceleration that could be easily used together with existing Vulkan API functionality. We enabled selected parts of the framework to be optional for deployment flexibility in keeping with the Vulkan philosophy. For this first version, we are primarily aiming to expose the full functionality of modern desktop hardware.

ISVs were also very clear—we needed to enable content using contemporary proprietary APIs, such as NVIDIA OptiX™ or Microsoft DirectX Raytracing, to be easily portable to Vulkan. Consequently we used a familiar overall architecture, including the re-use of HLSL shaders, while also introducing new functionality and implementation flexibility.

One critical use case for Vulkan Ray Tracing is real-time ray tracing in games - typically using a hybrid combination of a rasterized scene with some ray traced aspects. Some examples include rasterization post-processing after tracing primary rays, using ray tracing for shadow map generation, and dynamic light baking asynchronously with other system tasks. See Figures 1 and 2 for an example of hybrid rendering using Vulkan in Wolfenstein: Youngblood.

Vulkan Ray Tracing can also be used for accelerating offline production rendering and creative tools, for example offline light-map baking. There are many additional innovative techniques that can leverage an accelerated ray tracing framework, including non-rendering techniques. We look forward to hearing your ideas!

Figure 1: Wolfenstein: Youngblood - Ray Tracing OFF
Figure 2: Wolfenstein: Youngblood - Ray Tracing ON.
Note the improved reflections on the metal walls, on the floor through the doorway, and on the windows inside the room. It also improves the overall lighting in the scene and eliminates the lighting artifacts on the sides and floor of the doorway.

Introduction to the Vulkan Ray Tracing Extensions

The provisional version of Vulkan Ray Tracing that we are releasing today consists of a number of Vulkan, SPIR-V, and GLSL extensions.

The primary Vulkan extension is VK_KHR_ray_tracing which adds:

  • functionality for acceleration structure building and management,
  • support for ray tracing shader stages and pipelines, and
  • ray query intrinsics for all shader stages.

For those of you who are familiar with the VK_NV_ray_tracing extension, you will notice that this provides the same functionality, albeit with some changes and additions. Some of this functionality is optional, so be sure to check for the supported features and properties on your driver!

The features of this extension are described by the VkPhysicalDeviceRayTracingFeaturesKHR structure. The available features are:

  • rayTracing indicates support for ray tracing pipelines and shader stages
  • rayQuery indicates support for ray query functionality
  • rayTracingPrimitiveCulling indicates support for culling certain types of primitives during ray traversal
  • rayTracingIndirectTraceRays indicates support for tracing rays with dimensions sourced from a buffer
  • rayTracingIndirectAccelerationStructureBuild indicates support for building acceleration structures with geometry information sourced from a buffer
  • rayTracingHostAccelerationStructureCommands indicates support for CPU-based acceleration structure builds
  • rayTracingShaderGroupHandleCaptureReplay, rayTracingShaderGroupHandleCaptureReplayMixed, and rayTracingAccelerationStructureCaptureReplay indicate whether various forms of capture and replay debug functionality are available.

In the provisional release, only the rayTracing and rayTracingIndirectTraceRays features are required to be supported by implementations. It should also be noted that this extension also requires the descriptorIndexing and bufferDeviceAddress features from Vulkan 1.2 (or precursor extensions) to be supported.

The queryable properties of this extension are described by the VkPhysicalDeviceRayTracingPropertiesKHR structure. Some notable items here are that the size of the shader header (shaderGroupHandleSize) is required to be exactly 32, and the maximum recursion depth (maxRecursionDepth) is only required to be 1.

Feedback on the provisional specification will help determine which features are optional and which features and properties are required in the final specification.

VK_KHR_ray_tracing depends on two additional extensions which have been added as building blocks for additional functionality. These extensions add infrastructure but do not enable the functionality on their own. Future extensions may build upon these extensions in other areas of the API, but at this point VK_KHR_ray_tracing enables specific uses of this functionality for ray tracing only. VK_KHR_deferred_host_operations allows expensive driver operations to be offloaded to application-managed CPU thread pools which can enable work to be done on background threads or parallelized across multiple cores. With ray tracing this can be used for ray tracing pipeline compilation or CPU-based acceleration structure construction. VK_KHR_pipeline_library provides the ability to provide a set of shaders which can be linked into pipelines. With ray tracing it can be useful when incrementally constructing ray tracing pipelines.

Shaders for use with the VK_KHR_ray_tracing extension are supplied to the API as SPIR-V binaries which use two new SPIR-V extensions:

Developers can generate these binaries using either GLSL or HLSL. For GLSL, there are two new GLSL extensions: GLSL_EXT_ray_tracing and GLSL_EXT_ray_query, which are supported in the open source glslang compiler and are provided along with this release.

HLSL support is also incoming via DXC, Microsoft's open source HLSL compiler, allowing Vulkan ray tracing shaders to be authored in HLSL using the syntax defined by Microsoft, with minimal modifications.

The following sections of this document go into more detail on the ray tracing functionality including creating and using acceleration structures, host and deferred operations, ray traversal, ray tracing pipelines, and ray queries.

Acceleration Structures

To achieve high performance on complex scenes, ray tracing performs ray intersections against an optimized data structure built over the scene information called an acceleration structure (AS). The acceleration structure is divided into a two-level hierarchy as shown in Figure 3. The lower level, the bottom-level acceleration structure, contains the triangles or axis-aligned bounding boxes (AABBs) of custom geometry that make up the scene. Since each bottom level acceleration structure may correspond to what would be multiple draw calls in the rasterization pipeline, each bottom level build can take multiple sets of geometry of a given type. The upper level, the top-level acceleration structure, contains references to a set of bottom-level acceleration structures, each reference including shading and transform information for that reference.

Figure 3: Acceleration Structure Hierarchy

Building either type of acceleration structure results in an opaque, implementation-defined format in memory. The bottom-level acceleration structure is only used by reference from the top-level acceleration structure. The top-level acceleration structure is accessed from the shader as a descriptor binding.

An acceleration structure is created with vkCreateAccelerationStructureKHR. Like other objects in Vulkan, the acceleration structure creation just defines the “shape” of the acceleration structure and memory must be allocated and bound using vkBindAccelerationStructureMemoryKHR to the acceleration structure before further use. There is no support for sparse or dedicated allocations. In addition to querying the memory requirements for the acceleration structure allocation itself, vkGetAccelerationStructureMemoryRequirementsKHR returns sizes for the auxiliary buffers required during the build and update process.

Builds are performed with vk{Cmd}BuildAccelerationStructureKHR. For a bottom-level acceleration structure, the vertex data for triangles or the extent information for the AABBs is pulled from a buffer. A top-level acceleration structure pulls the shading, transform, and reference information for each instance from a structure in a buffer. An update to an acceleration structure is performed using the same functions with a special flag to indicate that an update of the positions from the existing acceleration structure is required.

Because the acceleration structure memory is in an implementation-defined, opaque format, there is a set of functions to perform operations on the acceleration structure data: vk{Cmd}CopyAccelerationStructureKHR, vk{Cmd}CopyAccelerationStructureToMemoryKHR, and vk{Cmd}CopyMemoryToAccelerationStructureKHR.

In addition to a basic copy, these functions can perform a restricted form of serialization and deserialization to save and restore accelerations structures with specific version compatibility requirements.

Acceleration structures may also end up with more space reserved for them than is required, so for large static acceleration structures, it can be beneficial to reduce the final amount of space used. An application can use vk{Cmd}WriteAccelerationStructuresPropertiesKHR to query the final compacted size then use vk{Cmd}CopyAccelerationStructureKHR to compact the acceleration structure.

Host Operations

Acceleration structures are very large resources, and managing them requires significant processing effort. Scheduling this work on a device alongside other rendering work can be tricky, particularly when host intervention is required. Vulkan provides both host and device variants of acceleration structure operations, allowing applications to better schedule these workloads. The device variants (vkCmd*AccelerationStructure*KHR) are enqueued into command buffers and executed on the device timeline, and the host variants (vk*AccelerationStructure*KHR) are executed directly on the host timeline.

Deferred Operations

Performing acceleration structure builds and updates on the CPU is a workload that is relatively easy to parallelize, and we wanted to be able to take advantage of that in Vulkan. An application can execute independent commands on independent threads, but this approach requires that there be enough commands available to fully utilize the machine. It can also lead to imbalanced loads, since some commands might take significantly longer than others.

In order to avoid these snags, we added deferred operations to enable intra-command parallelism: spreading work for a single command across multiple CPU cores. A driver-managed thread pool is one way to achieve this, but is not in keeping with the low-level explicit philosophy of Vulkan. Applications also run their own thread pools, and it is preferable to enable these threads to perform the work, so that the application can manage the execution of driver work together with the rest of its load.

Deferred host operations are designed around a “division of labor” principle. The application is responsible for:

  • Setting up commands and requesting deferred execution.
  • Assigning worker threads to execute deferred commands.
  • Setting priorities and CPU budgets as it sees fit, by choosing which tasks to execute, and when to execute them.

The driver is responsible for:

  • Tracking the execution state of a deferred command.
  • Implementing distributed execution, whatever parallel constructs are most appropriate for the workload (tasks, parallel loops, dependency graphs, work queues, and the like).

In this way, the application controls the allocation and prioritization of work, but the driver manages the low-level details.

To use deferred operations, the application first constructs a VkDeferredOperationKHR object, which encapsulates the execution state of a deferred command. This object will be in one of two states (Complete or Pending) throughout its life cycle, as shown in Figure 4.

Figure 4: Deferred Operation State Diagram

A deferred operation is constructed in the Complete state. The application issues a deferral request for a command by appending a new extension structure to the pNext chain of a command argument structure. If the driver honors the deferral request, the deferred operation transitions to the Pending state. Note that drivers are free to deny the request and simply execute the command in place, causing it to immediately become complete.

Once deferred, an operation will not progress until the application joins a thread to it by calling vkDeferredOperationJoinKHR. The join command instructs the driver to use the calling thread to process the command associated with a given deferred operation. An application may join any number of threads to a deferred operation, and doing so will generally cause the command to complete more quickly. The operation becomes Complete whenever at least one joined thread has observed a VK_SUCCESS return value from vkDeferredOperationJoinKHR. Note that if multiple threads have joined the deferred operation, the implementation may return early from the join if it knows that it has more threads joined than it is able to utilize.

Use Case: Simplified Compaction

Compaction is a very important optimization for reducing the memory footprint of ray tracing acceleration structures. Acceleration structure construction with compaction looks like this:

  1. Determine the worst-case memory requirement for an acceleration structure
  2. Allocate device memory
  3. Build the acceleration structure
  4. Determine the compacted size
  5. Synchronize with the GPU
  6. Allocate device memory
  7. Perform a compacting copy

In order to allocate memory for a compacted acceleration structure, an application needs to know its size. To determine the size, it needs to submit a command buffer for steps 3 and 4, and wait for it to finish.

This detail causes alarm bells to ring in the minds of experienced engine developers. If done naively, this sort of host/device handshaking can seriously degrade performance. If done well, it is a significant source of complexity, and can cause spikes in an application’s device memory footprint, because the uncompacted acceleration structures need to live in device memory for at least one frame.

Host builds allow us to remove both of these drawbacks. Using host builds, we can implement compaction by performing the initial build on the host, and then performing a compacting copy from host memory to device memory. This copy still requires monitoring so that the app can recover the host memory, but this is a more familiar pattern, one which engines already implement for uploading texture and geometry data to the device.

Use Case: Load Balancing

Host acceleration structure builds provide opportunities to improve performance by leveraging otherwise idle CPUs. Consider a hypothetical profile from a game:

Figure 5: Load balancing: No Host Build

In Figure 5, acceleration structure construction and updates are implemented on the device, but the application has considerable CPU time to spare. Moving these operations to the host allows the CPU to execute the next frame’s acceleration structure work in parallel with the previous frames rendering. This can improve throughput, even if the CPU requires more wall-clock time to perform the same task, as shown in Figure 6.

Figure 6: Load Balancing: Host Build Enabled

Ray Traversal

Tracing a ray against an acceleration structure in Vulkan goes through a number of logical phases, giving some flexibility on how rays are traced. Intersection candidates are initially found based purely on their geometric properties - is there an intersection along the ray with the geometric object described in the acceleration structure?

Intersection testing is watertight in Vulkan - meaning for a single geometric object described in an acceleration structure, rays cannot leak through gaps between triangles, and multiple hits can not be reported for different triangles at the same position. This is not guaranteed for neighboring objects that happen to abut, but it means individual models will not have holes in them, or be shaded excessively.

Once a candidate is found, a series of culling operations occur before the intersection is confirmed. These culling operations discard candidates based on flags used for traversal, and properties of the acceleration structure; details of these are in the specification. Remaining opaque triangle candidates are confirmed as valid intersections; whereas AABBs and non-opaque triangles require shader code to programmatically determine whether a hit occurred.

Traversal proceeds until all possible candidates are found and either confirmed or discarded, and a closest hit is determined. Traversal can also be made to end early to avoid unnecessary processing. This can be useful for detecting occlusion, or as an optimization in certain cases.

Tracing rays and getting traversal results can be done via one of two mechanisms in Vulkan; Ray Tracing Pipelines and Ray Queries (see Figure 7):

  • Ray queries provide direct access to ray traversal logic in any shader stage, allowing them to be plugged into existing shaders and enhancing the effects those shaders express.
  • Ray tracing pipelines provide a dedicated ray tracing mechanism with dynamic shader selection, enabling significant flexibility in the materials used in a scene and programmable intersection logic.
Figure 7: Ray Tracing Architecture in Vulkan

Ray Tracing Pipelines

Applications can associate specific shaders with objects in a scene, defining things like material parameters and intersection logic for those objects. As traversal progresses, when a ray intersects an object, associated shaders are automatically executed by the implementation (see Figure 8). A ray tracing pipeline is similar to a graphics pipeline in Vulkan, but with added functionality to manage having significantly more shaders and to put references to specific shaders into memory.

Ray tracing pipeline work is launched using vkCmdTraceRaysKHR with a currently bound ray tracing pipeline. This command invokes an application-defined set of ray generation threads, which can call traceRaysEXT() from the shader, starting traversal work on the specified acceleration structure. During traversal, if required by the trace and acceleration structure, application shader code in an intersection and any hit shaders can control how traversal proceeds. After traversal completes, either a miss or closest hit shader is invoked.

Callable shaders may be invoked using the same shader selection mechanism, but outside of the direct traversal context.

The different shader stages can communicate parameters and results using ray payload structures between all traversal stages and ray attribute structures from the traversal control shaders.

Figure 8: Ray Tracing Pipeline flow diagram.

To enable the traversal phase to know which shader to invoke after a given step of traversal to control or respond to the traversal, the implementation uses a shader binding table. Each shader entry consists of a shader group handle queried from the implementation for a given shader group plus an optional shader buffer record which the application may use for instance-specific data such as buffer device addresses or descriptor indices. The address for any given shader is computed by the traversal through a combination of parameters to the trace rays API call, parameters to the traceRayEXT shader call, and information stored in the acceleration structure.

Ray tracing pipelines can be created directly as with other pipeline types, but because ray tracing pipelines can have orders of magnitude more shaders than other pipelines types and we may want to add shaders, the extension adds another mechanism: pipeline libraries. A pipeline library is a pipeline including state and shaders with an additional flag to indicate that it is not intended to be bound directly to the API but is intended to be used as a library of code to be included in a later pipeline. Pipeline libraries can be used in multiple ray tracing pipelines, allowing reuse of shader compilation in multiple pipelines. A ray tracing pipeline creation may include a set pipeline library pipelines in the creation as well as a set of ray tracing shaders. All of the compile state from each shader must match to create a compatible final pipeline. In addition to pipeline libraries, deferred host operations can be used in ray pipeline construction to enable further parallelization.

Note that while pipeline libraries are exposed as a separate extension, they are only currently integrated for use with ray tracing pipelines.

Example Ray Pipeline Shaders (GLSL)

// Ray generation shader
#version 460 core
#extension GL_EXT_ray_tracing : enable
layout(location = 0) rayPayloadEXT vec4 payload;
layout(binding = 0, set = 0) uniform accelerationStructureEXT acc;
layout(binding = 1, rgba32f) uniform image2D img;
layout(binding = 1, set = 0) uniform rayParams
    vec3 rayOrigin;
    vec3 rayDir;
    uint sbtOffset;
    uint sbtStride;
    uint missIndex;
void main() {
    traceRayEXT(acc, gl_RayFlagsOpaqueEXT, 0xff, sbtOffset,
                sbtStride, missIndex, rayOrigin, 0.0,
                computeDir(rayDir, gl_LaunchIDEXT, gl_LaunchSizeEXT),
                100.0f, 0 /* payload */);
    imgColor = payload + vec4(blendColor) ;
    imageStore(img, ivec2(gl_LaunchIDEXT), payload);

// Closest hit shader
#version 460 core
#extension GL_EXT_ray_tracing : enable
layout(location = 0) rayPayloadInEXT vec4 payload;

void main() {
    payload = vec4(0.0, 1.0, 0.0, 1.0);

// Miss shader
#version 460 core
#extension GL_EXT_ray_tracing : enable
layout(location = 0) rayPayloadInEXT vec4 payload;

void main() {
    payload = vec4(0.0, 0.0, 0.0, 0.0);

Ray Queries

Ray queries can be used to perform ray traversal and get a result back in any shader stage. Other than requiring acceleration structures, ray queries are performed using only a set of new shader instructions.

Ray queries are initialized with an acceleration structure to query against, ray flags determining properties of the traversal, a cull mask, and a geometric description of the ray being traced.

Properties of potential and committed intersections, and of the ray query itself, are accessible to the shader during traversal, enabling complex decision making based on what geometry is being intersected, how it is being intersected, and where (see Figure 9).

Figure 9: Ray Query flow diagram

Ray Query Example (GLSL)

The following is an incomplete example of ray queries in GLSL, illustrating how a shader could use ray queries to detect whether a given position is in shadow or not. This could be added to a fragment shader to feed into lighting calculations. The overall structure for most ray queries will usually be similar - initialize, proceed in a loop, then make a final determination.

rayQueryEXT rayQuery;
rayQueryInitializeEXT(rayQuery, accelerationStructure,
                      cullMask, origin, tMin, direction, tMax);

while(rayQueryProceedEXT(rayQuery)) {     if (rayQueryGetIntersectionTypeEXT(rayQuery, false) ==         gl_RayQueryCandidateIntersectionTriangleEXT)     {         ... // Determine if an opaque triangle hit occurred         if (opaqueHit) rayQueryConfirmIntersectionEXT(rayQuery);     }     else if (rayQueryGetIntersectionTypeEXT(rayQuery, false) ==              gl_RayQueryCandidateIntersectionAABBEXT)     {         ... // Determine if an opaque hit occurred in an AABB         if (opaqueHit) rayQueryGenerateIntersectionEXT(rayQuery, ...);     } }

if (rayQueryGetIntersectionTypeEXT(rayQuery, true) ==     gl_RayQueryCommittedIntersectionNoneEXT) {     // Not shadow! } else {     // Shadow! }

Call for Feedback!

Khronos welcomes feedback on the Vulkan Ray Tracing set of provisional specifications from the developer and content creation communities through the Khronos Developer Slack and Vulkan GitHub Issues Tracker. Developers are also encouraged to share comments with their preferred hardware vendors.

A provisional release enables us to ship beta drivers and enable application prototyping to catalyze developer feedback. It also enables us to work on various open-source ecosystem artifacts in public, such as high-level compilers, validation layers, and debuggers, before spec finalization. Your feedback is critical to enable us to finalize the first version of Vulkan Ray Tracing and make it genuinely meet your needs!

However, as this is a provisional release, some functionality is likely to change before the final release, consequently we are asking that driver vendors not ship it in production drivers and that ISVs not use the provisional version in production applications.

Applications using the provisional functionality must specifically opt into the interfaces being defined in the Vulkan header using one of the following techniques (similar to the process for enabling the windowing system extensions), either by:

#include <vulkan/vulkan.h>

or by

#include <vulkan/vulkan_core.h>
#include <vulkan/vulkan_beta.h>

and should also check for the exact Vulkan extension version that they are expecting.

Additionally, the SPIR-V capabilities have “Provisional” in their names, and the tokens will be given different values if the functionality changes before the final release. This will enable offline inspection on shader binaries to determine which specification they were compiled for.

Although we do not have a specific timeframe for specification finalization, we want to move forward as quickly as we can, while ensuring the developer community is happy and we have a completed set of conformance tests and at least two implementations that can pass those tests.

Where Can I Get More Information?

Vulkan Specifications:

SPIR-V Specifications:

GLSL Extension Specifications:

Driver release updates and the status of Vulkan ecosystem components will be posted on the Vulkan Ray Tracing Provisional Release Tracker. A Vulkan SDK that includes support for Vulkan Ray Tracing will become available once all the necessary ecosystem components are upstreamed, check this link to watch for its availability.

Thank you for your interest and assistance to make Vulkan Ray Tracing truly effective for your applications!