Vulkan Ray Tracing Best Practices for Hybrid Rendering

Exploring ray tracing techniques in Wolfenstein: Youngblood

Real time ray traced reflections in Wolfenstein: Youngblood (left - without reflections, right - with ray traced reflections)

Overview

Today, the Khronos® Vulkan® Working Group has released the final Vulkan Ray Tracing extensions that seamlessly integrate ray tracing functionality alongside Vulkan’s rasterization framework, making Vulkan the industry’s first open, cross-vendor, cross-platform standard for ray tracing acceleration. The final ray tracing functionality is defined by a set of 5 extensions, namely VK_KHR_acceleration_structure, VK_KHR_ray_tracing_pipeline, VK_KHR_ray_query, VK_KHR_pipeline_library, and VK_KHR_deferred_host_operations. ISVs played a pivotal role in shaping the extension to enable hybrid rendering—where rasterization and ray tracing are used in tandem to achieve compelling levels of visual fidelity and interactivity. One example of a game using this hybrid approach in Vulkan is the implementation of ray traced reflections in Wolfenstein: Youngblood, a technique which we will be looking at in-depth in this post, while also discussing more general aspects of real-time ray tracing with Vulkan. Wolfenstein: Youngblood necessarily shipped with an earlier version of Vulkan Ray Tracing extensions, but the techniques used in the game are described here, and can be fully implemented, using the final production extensions that all developers should use now they are available.

This post assumes some familiarity with the Vulkan API.

Ray Tracing Primer

Acceleration structures

When developing a ray traced application, one of the first aspects to focus on is optimal representation of geometric data. Similar to the construct of a scene graph in classical rasterized applications, the geometry has to be organized into an acceleration structure (AS) which ultimately stores a transformation matrix for each child node present in the structure.

Figure 1. Organisation and interaction of the bottom and top acceleration structures with the shader binding table

The purpose of an acceleration structure (Fig. 1) is to reduce the overall number of ray intersection tests which have to be carried out on a per-frame basis. Therefore, an AS is commonly a tree structure such as a Bounding Volume Hierarchy (BVH) delimited into a top level and a bottom level. The top level structure (TLAS) consists of instances which reference the bottom level structure (BLAS) which, in turn, contains the actual vertex and index data, along with an axis-aligned bounding box (AABB) to encapsulate the geometry. Each instance in the TLAS references not only the transform data, but also shading information—which is how shader programs make the connection between the intersected geometry and its counterpart surface material data from the shader binding table (SBT). As it will be covered later on in the article, it is sufficient to provide a unique index per instance that will map the TLAS data to the material that can be used in the case of a potential hit.

The VK_KHR_acceleration_structure extension enables the building and updating of the AS. Invoking vkCmdBuildAccelerationStructuresKHR() (on the device) with minimal information such as the VKCommandBuffer, and VkAccelerationStructureBuildGeometryInfoKHR will create the AS. If the AS requires updating, then the mode member of the geometry info structure is set to VK_BUILD_ACCELERATION_STRUCTURE_MODE_UPDATE_KHR. For an update to take place, the application must provide a description of the acceleration structure which can have new information for instances, transforms, and vertex positions. All other defining characteristics of the AS have to remain unchanged. This is achieved by populating the two VkAccelerationStructureKHR members (srcAccelerationStructure, dstAccelerationStructure) of the geometry info structure with which the original BVH was built. Due to the restriction imposed on which parameters can differ between the source and destination AS, operations such as transitioning active primitives (src) to an inactive state (dst) or vice-versa are prohibited through an update. For this a full acceleration structure rebuild is required instead.

Ray tracing descriptor set and the ray tracing pipeline

A common approach in rasterized applications is to specify a descriptor set per type of material and consequently create a pipeline for each material required. In contrast, ray tracing shifts this paradigm - a ray can potentially hit any of the materials present in the scene and invoke a particular shader. Therefore, it is necessary to bundle all the required resources up-front, in a single set of descriptors. In hybrid applications, which rely on both rasterization and ray tracing, the ray tracing pipeline will use two descriptor sets - one for referencing the scene information (as used by rasterization), and another for referencing the acceleration structures and indices for the SBT.

Figure 2. The ray tracing mechanism achieved through the five shader stages

The scene descriptor set typically used in a rasterized scenario should be extended to allow the ray tracing shaders access to the camera transform. In general, the ray generation shader and closest hit shader (Fig. 2) will require references to materials, texture data, and buffers holding geometry (vertex/index).

A payload structure must be defined and passed to the ray tracing shaders in order to keep track of a ray’s state throughout the traversal process. The payload is used to communicate persistent information between shaders and determine how to resolve a hit or a miss.

VK_KHR_ray_tracing_pipeline supports five shader types which operate on the assumption of a predefined ray payload structure rayPayloadEXT to hold intermediate results from the shading stages:

  • Ray generation shader: is similar to a compute shader and it represents the starting point for ray tracing through the invocation of traceRayEXT(). Moreover, it processes the results from the hit group.

  • Closest hit shader: is executed when the ray intersects the closest instance. An application can support any number of closest hit shaders. This is typically used for carrying out lighting calculations and it can recursively trace rays.

  • Miss shader: is executed instead of a closest hit shader when a ray does not intersect any geometry during traversal. It can access the ray payload and trace rays, however, it cannot access attributes since it is not associated with an intersection. A common use for a miss shader is to sample an environment map.

  • Intersection shader: the built-in intersection test is ray-triangle. The intersection shader allows for custom intersection handling. Much like the built-in intersection program which writes the barycentric coordinates of the hit point in the attributes for the closest hit shader and any-hit shader to read, the intersection shader operates in a similar fashion. It does not have access to the ray payload.

  • Any-hit shader: similar to the closest hit shader, it is executed after an intersection is reported and can modify the ray payload. The difference is that any-hit shader considers any intersection in the ray interval defined by [tmin, tmax] and not the closest one to the origin of the ray. The any-hit shader is used to filter an intersection and therefore is often used to implement alpha-testing.

It is noteworthy that the VK_KHR_ray_tracing_pipeline extension enables the use of callable shaders in any of the ray tracing shaders. A callable shader is a type of program capable of accessing a callable payload (similar to a ray payload) in order to carry out a subroutine. This type of shader can, for example, be used to replace if-else blocks inside the closest hit shader in order to resolve different lighting calculations.

As with rasterization, the new ray tracing shaders undergo compilation into SPIR-V, which in turn are linked into a single ray tracing pipeline responsible for directing the intersection calculations to the appropriate shader program.

A small number of global uniform values (the max limit of which can be queried with VkPhysicalDeviceLimits.maxPushConstantSize) can be stored in push constants, especially useful if they undergo frequent modifications between draw calls such as transformation matrices. This is achieved at pipeline creation time and is supported across the ray tracing shaders.

To enable the connection to an SBT, information for each shader stage can be stored in an array where the unique indices will map the SBT entries through the VkRayTracingShaderGroupCreateInfoKHR structure. The shader group information contains two notable members:

  • VkRayTracingShaderGroupTypeKHR - for specifying the associated hit groups
  • Handles for general shaders - the index for the ray generation, miss, and callable shaders specified in the array of shader stage information (VkRayTracingPipelineCreateInfoKHR).

When a shader is not specified as part of a group (for example, when not defining a custom intersection shader), the appropriate shader handle member is set to VK_SHADER_UNUSED_KHR which indicates to the driver that the default built-in shader should be used, and not that the shader should be absent altogether.

The Shader Binding Table (SBT)

The SBT is, in essence, an array of unique handles referencing shaders, or shader groups, which will be used during the ray tracing process. There is a limit of 32 bytes per entry, and they generally map to: the ray generation shader for kickstarting the ray tracing process, the miss shader for sampling the environment map if no BLAS geometry is intersected, and the hit group (closest hit, any-hit shaders) for sampling the correct material at the intersection point. As the SBT indicates which hit group will be executed for each instance, it is important to associate instances and shader groups when creating the AS. This is achieved by providing a hitGroupID per instance in the TLAS. The ID will map into the SBT hit groups. Consequently the SBT size is dictated by the number of groups, the handle sizes, and the shader alignment with the assumption that the ray generation shader is always at index 0 in the SBT. Information describing the SBT and its contents (address, stride, and size) are required when invoking vkCmdTraceRaysKHR().

Ray Lifetime - The Bigger Picture

Once the acceleration structure is built, the ray generation shader can be invoked which in turn dispatches a number of rays against the BVH instances. Traversal flags can be specified to perform culling operations which discard potential hit candidates based on properties such as transparency.

Figure 3. Overview of the ray tracing pipeline and its interaction with the acceleration structures through the shader binding table

If an intersection is found (as a result of the intersection test specific to the intersection shader) then a number of culling operations are carried out before the final hit can be confirmed. These operations stem from the flags specified at the beginning of the traversal process. Note that the intersection test is watertight, preventing rays from traversing through the boundaries of triangles or reporting multiple hits for the same set of coordinates on separate geometries.

The traversal process continues until all candidates have been queried and consequently either discarded or confirmed as a hit. When a hit is confirmed, the shaders in the hit group will return information to the ray generation shader through the payload structure identified through the rayPayloadEXT qualifier.

Developing a Hybrid Application

Figure 4. Illustration of ray traced reflections off (left hand image) and on (right hand image)

Building the Acceleration Structure

Scene geometry can be represented in the BLAS instances using different precisions, depending on how it will be used and its world space position and orientation. In Wolfenstein: Youngblood, vertex positions use a 16-bit integer format for the static geometries (VK_FORMAT_R16G16B16_SNORM), whereas hair and skin vertex positions use 32-bit floating point precision (VK_FORMAT_R32G32B32_SFLOAT). Indices rely on 16-bit precision throughout all geometry (VK_INDEX_TYPE_UINT16).

In dynamic scenes, the BLAS may have to undergo an update and in some cases even a full rebuild. Both operations can impact performance depending on the frequency of their invocation as well as the size of the scene geometry referenced within the BLAS. In this title, the BLAS instances referencing skinned/animated objects, along with particles, require an update each frame (Fig. 5). If the number of vertices or indices increases over the initial allocation, then a complete recreation of the BLAS structures is required.

Figure 5. Overview of updating versus rebuilding of the BVH

To reduce the total BLAS instance count, it is advisable to group multiple geometries per BLAS. The hit group in the SBT will recognize individual geometries through the use of gl_GeometryIndexEXT. Moreover, dynamic skinnable geometry such as animated characters can be processed through a compute pipeline which takes the vertex or index buffer along with the transformation matrices as input, and outputs a world-space position vector which is in turn stored in a BLAS instance. This approach improves performance as no buffer data needs to be updated, and the hit group can calculate this position using the ray origin and its extent range.

Titles such as Wolfenstein: Youngblood are characterised by fast-paced dynamic interactions made possible through a vast amount of animated and deformable objects. This means that skinnable meshes require far more frequent updates in the AS compared to static geometries and this could cause a noticeable performance decline. This approach has the benefit that the initial vertex buffer is never modified. Instead, the compute pass resolves the new transformed positions based on the matrices, and the corresponding BLAS will be updated with the resulting position values. Moreover, the world-space hit position can be calculated based on the hit origin and distance and therefore there is no need to store this information in a vertex buffer.

In addition to the optimizations carried out at BLAS instance creation time, culling operations can be used to reduce the number of TLAS instances which require an update. Ideally, the application should never have to hold the entire scene in the TLAS, and instead determine what can be culled and what makes it through to the TLAS, through mechanisms such as an extended camera frustum test or instance size, where the smallest instances are culled at short distance. In Wolfenstein: Youngblood, the heuristic used for TLAS instance culling compares the angular size of the instance’s AABB (in camera space) to an arbitrary threshold (Fig. 6). If this size is greater than the threshold, then the change is significant enough for the instance to make it to the TLAS.

Figure 6. Acceleration structure update when dealing with dynamic data

Ray Traced Reflections

Effects such as screen space reflections are widely used in rasterized applications, and despite their impact, these techniques come with a number of limitations around reflecting off-screen pixels or geometry such as particle systems lacking in depth data. By overcoming these drawbacks, ray traced reflections become a compelling solution. Furthermore, ray tracing enables the modelling of both opaque and transparent reflections without having to compromise performance.

In this hybrid setup, opaque reflections are achieved by using depth data and normal maps from the raster pass as input to the primary ray closest hit shader result. A ray is traced per pixel from the closest hit point in the reflected direction of the surface based on the surface roughness map. If the miss shader is invoked, the image-based lighting or environment are sampled instead.

Figure 7. Ray traced reflections in the case of a uniform surface

When dealing with surface roughness it is safe to reduce the max ray bounds Tmax as it improves traversal performance and does not affect the quality of the resulting reflection as the final result undergoes further sampling and denoising.

Figure 8. Ray traced reflections on a rough non-uniform surface

The ray generation shader relies on this data to output the reflected radiance and hit distance, which are subsequently integrated into the deferred composite lighting pass yielding the final reflection color. A glossy opaque reflection can be achieved here by blurring the target based on an associated roughness map.

Transparent surfaces are drawn first during the raster pass, and the normal and depth data is used as input to the ray generation shader in order to mask out the opaque areas. In the second raster pass, the mask is used to obtain the blended transparent surfaces layered in the correct order (Fig. 9).

Figure 9. Ray traced reflections for transparent geometry

A significant aspect of ray tracing in Wolfenstein: Youngblood is the setup and use of the SBT. The AS is organised so there is a single geometry per BLAS instance so that the SBT is not required - the InstanceID is used directly to identify the instances. The SBT is structured so there is a single ray generation shader handle followed by a hit group item for each geometry (where each hit group has its own surface ID). The last entry in the SBT is the miss shader handle (Fig. 10).

Figure 10. Organising the shader binding table entries in Wolfenstein: Youngblood

Accurate real time reflections of particles is difficult to achieve in rasterized screen space reflections without compromising performance. In a hybrid setup, the particle geometry is represented by camera-aligned quads that are alpha-masked when the alpha value fits below a set threshold during ray tracing for performance considerations.

In order to handle the variety of materials for ray traced reflections, there are two types of ray generation shaders in place: opaque and transparent which can be used in conjunction with five major type of hit shaders to model the outdoor world objects (static), glass, particles, skin, and hair. The entire ray tracing pipeline is created at engine initialization stage to avoid the stutter characteristic to shader compilation.

For improving interoperability, the application relies on the buffer device address Vulkan extension (VK_KHR_buffer_device_address) for VB/IB reference storage in a buffer which is later accessed by the hit group shaders. Through this extension a buffer device address value can be queried by the application therefore exposing memory via PhysicalStorageBuffer. This enables pointer-like functionality in the shader.

The hit group often requires access to texture data therefore enabling bindless texture support is a priority. Dynamic (non-uniform) indexing into an array of textures is facilitated by the descriptor indexing Vulkan extension (VK_EXT_descriptor_indexing). This is a good solution when there is a need to access a large number of textures, as is the case for ray tracing, whilst avoiding frequent binding changes.

Despite the technique’s impactful visual contribution, this approach to real-time ray traced reflections is not without its challenges when interacting with engine mechanics. Off-screen lighting is one example—the engine reserves lightmap and light cluster data access only to the view frustum. This means that lighting data is not available when the hit position is off-screen. The solution in this situation is to fallback to spherical harmonics lighting, even though this could potentially lead to view-dependent lighting artifacts.

Figure 11. Off-screen lighting and real time ray tracing challenges

Concluding Remarks

The techniques developed in Wolfenstein: Youngblood bring together the classic rasterized rendering model and hardware-enabled real-time ray tracing to enhance effects which traditionally have not been possible in an interactive game. This hybrid approach based on the Vulkan API showcases visual fidelity without compromising performance - due in no small part to the Vulkan extension ecosystem.

Numerous challenges which previously had no viable solution can now be tackled. However, the hybrid model does give rise to its own set of potential performance pitfalls - some of which are addressed in this post. One noteworthy takeaway is that optimal acceleration structure management can lead to significant performance improvement. For example, grouping multiple geometries in a single BLAS in Wolfenstein: Youngblood gave a 60% BLAS count saving which dramatically speeds the ray traversal process, taking up a smaller portion of the frame time and increasing the budget for other effects.

Hybrid rendering is just one technique that will be enabled by the ray tracing extensions in Vulkan, and it will likely be widely adopted to significantly increase visual realism in rasterized games. Effectively deploying this technique also points the way to general best practices for effective resource management and processing budget allocation for ray tracing.

Acknowledgements & related materials

Authored by

Comments

 

Khronos® and Vulkan® are registered trademarks, and ANARI™, WebGL™, glTF™, NNEF™, OpenVX™, SPIR™, SPIR-V™, SYCL™, OpenVG™ and 3D Commerce™ are trademarks of The Khronos Group Inc. OpenXR™ is a trademark owned by The Khronos Group Inc. and is registered as a trademark in China, the European Union, Japan and the United Kingdom. OpenCL™ is a trademark of Apple Inc. and OpenGL® is a registered trademark and the OpenGL ES™ and OpenGL SC™ logos are trademarks of Hewlett Packard Enterprise used under license by Khronos. All other product names, trademarks, and/or company names are used solely for identification and belong to their respective owners.
devilish