Skip to main content

An Introduction to Vulkan Video

In early 2018 the Vulkan Working Group at Khronos started to explore how to seamlessly integrate hardware accelerated video compression and decompression into the Vulkan API. Today, Khronos is releasing a set of Provisional Vulkan Video acceleration extensions : ‘Vulkan Video’. This blog will give you an overview of Vulkan’s new video processing capabilities and we welcome feedback before the extensions are finalized so that they may provide effective acceleration for your video applications!

Vulkan Video adheres to the Vulkan philosophy of providing flexible, fine-grained control over video processing scheduling, synchronization, and memory utilization to the application. Leveraging the existing Vulkan framework enables efficient, low-latency, low-overhead use of processing resources, including distributing stream processing tasks across multiple CPU cores and video codec hardware—all with application portability across multiple platforms and devices ranging from small embedded devices to high performance servers.

Figure 1. Stages in typical Vulkan Video decode and encode applications

To complement the low-level design of Vulkan Video, Khronos plans to add support in the Vulkan SDK with layers for validation and higher-level abstractions that will speed the development of video applications where simple frame-in-frame-out and black-box decoding and encoding is sufficient. This will be complemented by open source Vulkan Video samples for a range of application use cases on Windows and Linux.

The Provisional Vulkan Video extensions closely integrate hardware accelerated video processing with Vulkan’s existing graphics, compute and display functionality. We invite all developers to provide feedback so the finalized Vulkan Video 1.0 extensions can be finely tuned to provide exciting new capabilities for Vulkan applications everywhere!

Vulkan Video Extensions Overview

GPUs typically contain dedicated video decode and encode acceleration engine(s) that are independent from other graphics and compute engines. In fact, some physical devices may support only video decode and/or video encode operations. Consequently Vulkan Video adds video decode and encode queues, the presence of which can be queried by using VkQueueFlagBits.

Also, the field of video codecs is continuously evolving, enabling ever more efficient video compression and decompression through increasingly advanced and domain-specific video coding tools—resulting in new codecs and codec extensions. Consequently Vulkan Video has been designed with flexible support for a wide variety of existing and future codecs by being divided into universal ‘core’ extensions expected to be relevant to all codecs, and codec specific extensions. Core extensions include video queue functionality that are video codec-independent:

Figure 2. Vulkan Video core and codec-specific extensions
VP9 and AV1 extensions will be shipped in a future release

This provisional Vulkan Video release also includes three extensions that extend base structures defined by the core video KHR extensions to support H.264-decode, H.264-encode, and H.265-decode:

These EXT extensions do not define API calls, they simply extend data structures. There is an H.265-encode extension currently in development, and VP9 decode and AV1 decode/encode extensions are expected to follow soon after in a later release.

As an example, a Vulkan Video implementation that only supports H.264 decoding would only expose support for VK_KHR_video_queue, VK_KHR_video_decode_queue, and VK_EXT_video_decode_h264 extensions, and an application would use all three extensions together to perform H.264 decode operations on that target device.

The standard vkGetPhysicalDeviceQueueFamilyProperties2 API may be used to determine support for codec extensions, such as H.265 decode, H.264 encode, by chaining VkVideoQueueFamilyProperties2KHR to retrieve VkVideoCodecOperationFlagsKHR.

Vulkan Video Codec Std C-headers

Video coding experts often analyze video bitstreams to investigate coding artifacts and improve video quality using codec-specific syntax elements in the bitstream using the codec specification that defines behavioral descriptions of syntax and tools. Vulkan Video makes it easy to recognize API fields corresponding to codec syntax elements or codec-defined terms, without bloating the Vulkan specification with descriptions already well documented in the codec standard specifications.

Codec-specific standard ("Std") C-headers define structures with explicit and derived codec syntax fields in the naming and style convention of the corresponding codec standard specification. These Std structures are used as fields in Vulkan Video codec EXT extension structures. The provisional Vulkan Video release provides the following codec Std headers:

  • vulkan_video_codec_h264std.h: defines structures and types shared by H.264 decode and encode operations.
  • vulkan_video_codec_h264std_decode.h: defines structures used only by H.264 decode operations.
  • vulkan_video_codec_h264std_encode.h: defines structures used only by H.264 encode operations.
  • vulkan_video_codec_h265std.h: defines structures and types shared by H.265 decode and encode operations.
  • vulkan_video_codec_h265std_decode.h: defines structures used only by H.265 decode operations.
  • vulkan_video_codecs_common.h: defines a versioning macro used by other Std headers for version maintenance.

Video Transcoding Example

Video transcoding is often used to transition video content from an older to a newer codec to benefit from improved compression efficiency. It may also be used to convert content to a codec more appropriate for efficient consumption at the target environment. Figure 3 depicts a basic block diagram for video transcoding.

Figure 3. High-level Vulkan Video Transcoding Process

The first phase of video transcoding is decoding an input video bitstream (sequence of bytes) to generate the images that make up the video sequence. Decoding individual images in the bitstream often requires referencing one or multiple previously decoded images, which must be retained for this purpose in the Decoded Picture Buffer (DPB). Note that some implementations may support using the same image resources for output images and DPB images while others may require or prefer decoupling output images from the decode operation from DPB images, for example to use proprietary layouts and store metadata along with DPB images while keeping output images in standard layouts for external consumption. Finally, to arrive at the original video sequence it may be necessary to re-order output images as instructed by the bitstream.

The second phase of transcoding involves encoding the decoded images with a new codec (or perhaps the same codec with a different set of tools). The encoding process is essentially the reverse of the decoding process: the input is a sequence of images, which may be re-ordered before encoding, and it may be necessary to retain "reconstructed" or decoded versions of the images for reference while encoding the following images. Note that in general, input images are not used for reference in the encoding process to avoid drift when decoding the bitstream at the consumer end since encoding is usually a lossy operation. Transcoding applications pipeline decode and encode operations to reduce the number of decode output / encode input images needed while transcoding.

So, how would we implement this transcoding example using Vulkan Video?

Video Resources & Profiles

The first step of a transcoding application is to allocate the necessary resources. The basic resources for video decode and encode operations use standard Vulkan objects:

  • Video decode input and encode output bitstreams: VkBuffer
  • Video decode output, encode input, and decode/encode DPB images: VkImageView backed by VkImage

Vulkan Video extends VkBufferUsage, VkImageUsage and VkImageLayout with bits relevant to video decode/encode usage and layouts, that are used by applications to optimally manage video decode and encode resources.

Video codecs typically define "profiles" that are used to advertise the feature set used by a coded bitstream. Codec-compliant HW decoders often support the full set of profile features so they can process all compliant content. In contrast, hardware vendors may support selected profile features in a hardware encoder, and still generate a compliant bitstream, driven by area and cost considerations while prioritizing key encoding APIs and use cases. The VkVideoProfileKHR structure defines the target video profile:

  • The video codec operation (e.g. H.265-decode or H.264-encode)
  • The YCbCr chroma-subsampling and luma/chroma component bit-depths (e.g. 4:2:0, 8-bit luma/chroma), as video codecs operate on YUV images for coding efficiency
  • The codec-specific video profile (e.g. H.264 Main profile), via a chained EXT structure specific to the codec-operation in use

Resources intended for video operations may have implementation-specific properties and requirements based on the target video profile, and so applications should specify the target video profile when querying properties, or creating various resources (images, buffers, etc.).

The VkFormat API call enumerates the supported video images for a given video codec operation and video profile:

  • vkGetPhysicalDeviceVideoFormatPropertiesKHR

Video Session

Once resources are allocated, the transcoding application creates a video session. The VkVideoSessionKHR video session object provides a context to store persistent state while operating on a particular video stream. Separate instances of VkVideoSessionKHR may be created to concurrently operate on multiple video streams. The following APIs create, destroy, query memory requirements, and bind memory to video session objects:

  • vkCreateVideoSessionKHR
  • vkDestroyVideoSessionKHR
  • vkGetVideoSessionMemoryRequirementsKHR
  • vkBindVideoSessionMemoryKHR

If the application is to support decoding a video bitstream that dynamically changes resolution, to deal with varying network conditions for example, the video session should be created with maximum video stream parameters so that sufficient resources are allocated.

An API is provided for the application to query the capabilities of the implementation, including minimum and maximum limits for certain settings:

  • vkGetPhysicalDeviceVideoCapabilitiesKHR

Video Session Parameters

Vulkan Video uses VkVideoSessionParametersKHR objects, created against a given VkVideoSessionKHR instance, to store video parameter sets to control stream processing, e.g. to describe settings that apply to one or more pictures within a stream—such as H.264 sequence and picture parameter sets.

The application may create multiple session parameters objects for a given video session, specifying the maximum number of parameter sets of various kinds that this object is expected to hold. This allows the user to later add more parameter sets to the same object, subject to certain conditions. Alternatively, the user may create another session parameters object with more storage capacity, and inherit existing parameter sets retained from a previously created session parameters object. This avoids re-translation of parameter sets through the Vulkan API and enables re-using their internal representations across objects.

The following APIs are provided to create, destroy and update video session parameters:

  • vkCreateVideoSessionParametersKHR
  • vkDestroyVideoSessionParametersKHR
  • vkUpdateVideoSessionParametersKHR

Currently, the session parameters object is used to store H.264 SPS and PPS parameter sets, and H.265 VPS, SPS, and PPS parameter sets. For decode operations, the application is expected to parse bitstream segments containing these codec headers to create/update session parameters objects as needed.

Video Decoding Process

Now the Video session is created, decoding can start by parsing the video bitstream into a sequence of individually decodable bitstream segments, as defined by the video codec. Some of these segments carry codec parameter sets that are applicable to multiple pictures in the sequence, as described earlier. Other bitstream segments carry the coded picture themselves, or coded sub-picture regions (e.g. H.264 slices).

Figure 4. Vulkan Video Decode Process Details

Video decode hardware acceleration is typically needed only for the bitstream segments related to images/pictures or their sub-regions, while segments related to parameter sets are designed for simple CPU-based decoding or parsing. Parameter sets are also designed to efficiently communicate resource requirements for decoding the video bitstream ahead of time, and to determine whether the hardware decoder supports decoding the actual bitstreams or not.

As well as accelerating picture or sub-region decoding, implementations may also utilize various techniques to work around bitstream errors (e.g. caused by corruption during unreliable network transmission). It may also be necessary to store statistics or state related to prior decoding to aid decoding current/future pictures/sub-pictures in the video sequence. Typically, an application will use Vulkan Video for the heavy lifting for picture-level decoding, while handling parsing, resource management and synchronization internally.

Video Decode Operation Command

Now, it is finally time to record the video decode operation into a Vulkan command buffer using:

  • vkCmdDecodeVideoKHR

This is the only API call provided in the VK_KHR_video_decode_queue extension. Command buffers and bitstream data are built for the video device in memory before submission to the GPU.

Currently only picture-level decode commands are supported (as specified by the appropriate codec-specific EXT extension structures for decode operations, for example VkVideoDecodeH264PictureInfoEXT). We are interested to hear of use cases that need to request more fine grained operations!

Video Encoding Process

Now we have the decoded images, encoding involves similar detailed tasks to decoding but with considerably more decision points (Figure 5). At the sequence level the application can configure the target bitrate for the generated bitstream. Implementations employ proprietary algorithms to assess picture complexity and budget the bit allocation across pictures and within sub-regions of each picture. Commonly known as "rate control", this feature also necessitates storing statistics and state that may be utilized while encoding future pictures of the sequence.

Figure 5. Vulkan Video Encode Process Details

As part of the encoding process, decisions must also be made regarding which codec tools to use when encoding each picture or sub-picture region, and which other pictures should be referenced while encoding. Decisions may even be applied at the lowest-level coding units (e.g. 16x16 pixel blocks) for which bitstream syntax may be specified (as defined by the codec). The appropriate parameter sets must be coded in addition to the bitstreams for pictures or sub-picture regions to generate the final elementary video bitstream.

Encoder implementations may vary in the set of codec tools offered, and the level of detailed control exposed to the user. Similarly, user expectations vary significantly for encode; some users prefer black-box encoders that are simply fed images and some high-level settings with all detailed syntax being generated under-the-hood. Some advanced users may desire more control of the low-level encoding process to enable domain-specific optimizations in the application.

Vulkan Video is a result of balancing these requirements, resulting in a low-level API to encourage broad silicon vendor adoption, while relying on tools and layers to hide complexity from applications that prefer a higher-level API. Vulkan Video enables vendor extensions that expose vendor-specific controls which may be standardized if there is cross-vendor support.

Figure 5. also illustrates some of the additional Vulkan Video commands and queries introduced, which are described next.

Video Encode Operation Commands

Now we are ready to start the encoding process by recording the video encode operation into a Vulkan command buffer:

  • vkCmdEncodeVideoKHR

This is the only API call provided in the VK_KHR_video_encode_queue extension.

Currently only picture-level encode commands are supported (as specified by the appropriate codec-specific EXT extension structures for encode operations, e.g. VkVideoEncodeH264VclFrameInfoEXT). In the future, the encode command may support encoding of a sub-picture region by itself (e.g. a single slice in a multi-slice frame in H.264).

All picture and reference management decisions are left to the application, which also has direct control over bitstream syntax related to reference management. In addition, the application may optionally request generation of H.264 SPS/PPS bitstream segments by the implementation (see VkVideoEncodeH264EmitPictureParametersEXT). This provides a path for implementations to generate a complete elementary bitstream if needed.

Encoder rate control settings are recorded using the following API into a Vulkan command buffer:

  • vkCmdControlVideoCodingKHR

Note that these settings take effect in the execution timeline (i.e. at queue submission). This API also allows resetting the video session to the initial state, for example if the video session will be used to process a new video stream. This generic API hook enables future extensions for other stream-level control operations.

Video Command Buffer Context

As a number of decode or encode operations may be recorded in the same command buffer, all relying on the same set of resources and settings, Vulkan Video defines a pair of API calls to mark the scope of video command control parameters during a session:

  • vkCmdBeginVideoCodingKHR
  • vkCmdEndVideoCodingKHR

vkCmdBeginVideoCodingKHR sets up the command buffer context for video operations on a single video stream. The VkVideoSessionKHR object is provided at this point, along with the VkVideoSessionParametersKHR object containing parameter sets for use in all subsequent video decode or encode operations until the end of scope. One or more vkCmd*Video*KHR are expected after this, specifying the actual decode/encode operation(s) and/or video control operation(s). Standard Vulkan commands for synchronization, layout transition etc. may also be present along with the video commands. vkCmdEndVideoCodingKHR ends the scope of video operations.

Multiple sets of video commands delimited by the vkCmdBeginVideoCodingKHR and vkCmdEndVideoCodingKHRcommands may be recorded into the same command buffer, using either the same or different video session and video session parameters objects for each set. It is also possible to use a video session parameters object with the corresponding video session object in multiple command buffer recording calls—recording into multiple command buffers in parallel.

Video Queries

Vulkan Video adds a new mandatory VkQueryType to report the location and size of the encoded bitstream in the output buffer (see VK_QUERY_TYPE_VIDEO_ENCODE_BITSTREAM_BUFFER_RANGE_KHR).

In addition, an optional result status query type is added to determine the completion status of a set of operations enclosed between the vkCmdBeginQuery and vkCmdEndQuery commands. This result status may be reported by itself using the VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR query type, or in conjunction with another query type using VK_QUERY_RESULT_WITH_STATUS_BIT_KHR. The result status is not specific to video operations, and may be used to report errors during execution of any Vulkan commands that require additional investigation. For video operations, implementations may report error status when decoding syntax errors are encountered, or when the encode-bitstream buffer overflows.

As video queries are generally consumed by the host, video queues only support host translation of query results (vkGetQueryPoolResults), and do not support device translation (vkCmdCopyQueryPoolResults). Please let us know If device translation is important for your use case!

And this concludes the transcoding example walkthrough! We hope this has given you a taste of how Vulkan Video could enable new capabilities in your own products by integrating low-level video acceleration in sophisticated Vulkan pipelines combining video, graphics, compute and display operations.

Call for Feedback!

The release of the Provisional Vulkan Video extensions marks the first public exposure of this significant new Vulkan functionality and is an important milestone to enable industry review and feedback. Please share your thoughts through the Khronos Vulkan Video GitHub Issue.

As with any provisional release, the Vulkan Video extensions may be updated in response to developer feedback. We therefore request that driver vendors not ship production drivers supporting these provisional extensions, and that ISVs not use these provisional extensions in their production applications. To use these provisional extensions, applications must explicitly enable beta extensions as follows:

#define VK_ENABLE_BETA_EXTENSIONS
#include <vulkan/vulkan.h>

Vulkan Video provisional extensions specification links:

NVIDIA has released beta Vulkan drivers that implement Vulkan Video, and a sample Vulkan Video decoding application vk_video_decoder to enable developers to prototype and experiment against the current provisional extensions.

Vulkan SDK validation layer support will be added for the finalized Vulkan Video 1.0 extensions. For this provisional release, validation layers will only be verified to work with Vulkan Video extensions disabled.

Khronos will now work to finalize the Vulkan Video 1.0 specifications, SDK and conformance tests, so focus can then shift towards supporting additional codecs and more advanced video features!

We look forward to your feedback on Vulkan Video. Thank you for your interest and support to make Vulkan Video effective for your use cases and applications!

Comments