Copying Images on the Host in Vulkan
The Vulkan Working Group has released the VK_EXT_host_image_copy extension, allowing copies to and from images to be done on the host rather than the device.
Vulkan already provides functions to copy between buffers and images through vkCmdCopyBufferToImage , vkCmdCopyImageToBuffer, and vkCmdCopyImage (later made extensible in VK_KHR_copy_commands2 and Vulkan 1.3). These functions are essential as the physical layout of an image (otherwise known as memory swizzling) created with VK_IMAGE_TILING_OPTIMAL is opaque to the application, so it cannot meaningfully copy to and from such an image by mapping its device memory on the host. What’s more, its device memory may not be host-mappable to begin with. However, the Vulkan implementation is capable of copying to and from these types of images with hardware-accelerated swizzling.
These functions work well when the image data is available or needed in device memory and it is desirable to do the copy on the device timeline. There are, however, common scenarios where that is not true.
- A game that is streaming image data from disk, a map application that is streaming image data from its servers, or a browser that is streaming image data for a web page are examples where the image data is not readily available on the device,
- Similarly, an image processing application storing image data to disk, or a game engine that is baking GPU-rendered images are examples where the image data is not needed on the device,
- Finally, the data copy may not necessarily best be done on the device. The device may be busy with other work, or it may be desirable to do the copy on the CPU timeline.
Additionally, whenever the data is available or needed in host memory, a buffer↔image copy can incur two penalties:
- Double peak GPU memory usage: For a (brief) period between the allocation of the buffer to hold the image data and the time it can be freed (after the device is finished with the copy), the amount of allocated GPU memory is about double what is needed; there is the memory for the images themselves and about the same amount of memory for the buffers. This particularly affects scenarios such as a game uploading a large amount of image data during a load screen where said brief period contains lots of image copy operations.
Note that applications can reduce peak memory usage by making incremental submissions as image copy operations are being recorded, allowing them to reclaim buffer memory earlier, potentially while more image copy operations are being recorded and executed. However, the memory overhead cannot be entirely eliminated.
- Double copy: Ideally, the application would map the buffer memory and stream directly into it. Unfortunately this is not always possible, typically due to the architecture of the application and the decoupling of the streaming and GPU modules. In that case, the data is available in host memory and needs to be copied (by the host) to the buffer first, and then copied again to the image (by the device).
The recently released VK_EXT_host_image_copy extension aims to address these inefficiencies.
Image Copy from Host Memory
To copy data to a supported image directly from host memory using only the CPU, applications can use the new vkCopyMemoryToImageEXT function. The parameters to this function are analogous to vkCmdCopyBufferToImage2, except they take a generic host pointer instead of a VkBuffer handle and offset.
If the image data is already swizzled in host memory according to the image’s actual physical layout on the device, it can be copied more efficiently using the VK_HOST_IMAGE_COPY_MEMCPY_EXT flag. This can be useful on fixed hardware (such as a game console) where the physical layout of images may be known, or if the image data has previously been copied to host memory using the same flag.
Image Copy to Host Memory
Similarly, to copy data from a supported image to host memory using only the CPU, applications can use the new vkCopyImageToMemoryEXT. The parameters to this function are analogous to vkCmdCopyImageToBuffer2, except they take a generic host pointer instead of a VkBuffer handle and offset.
Using the VK_HOST_IMAGE_COPY_MEMCPY_EXT flag, the image data can be obtained while retaining the physical layout of the image. This can be used for example by a game engine to bake image data pre-swizzled for a specific device, or by a game to cache additionally downloaded content in a form that is faster to load in future runs (by copying image data to a temporary image and reading it back using this flag). It is important to associate pre-swizzled data with VkPhysicalDeviceHostImageCopyPropertiesEXT::optimalTilingLayoutUUID, as the physical layout of images can change with driver updates.
When using VK_HOST_IMAGE_COPY_MEMCPY_EXT to copy from an image with the VK_IMAGE_TILING_OPTIMAL layout, the application can query the memory size required to hold the image data by using VkSubresourceHostMemcpySizeEXT.
Image to Image Copy on Host
Analogous to vkCmdCopyImage2, applications can use the new vkCopyImageToImageEXT function to copy between images with identical creation parameters, using only the CPU. This can allow an application to defragment memory using the CPU without affecting frame rendering on the GPU (by flooding it with extra copy operations).
Setting Up for Host Copy
To perform image copies on the host, if supported, an image needs to be set up to allow it. The image must first be created with the VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT flag. Note that setting this flag has two ramifications:
- The memory type requirements of the image may change. For example, an image that may otherwise not require its memory to be host-visible may require it when created with this flag. This can limit the amount of memory available for the application’s images as well as affect their performance, so it is not advisable to set the flag unconditionally on all images on all platforms. This is particularly the case for non-UMA (Unified Memory Architecture) devices. On UMA devices, this flag typically does not affect the image memory type requirements. Regardless of the type of device, the VkPhysicalDeviceHostImageCopyPropertiesEXT::identicalMemoryTypeRequirements property reports whether that is the case.
Applications can still choose to use the VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT flag in spite of such a limitation for images where the benefits from being host-copyable outweigh the costs.
- The physical layout of the image may change given this flag, which may reduce the performance of device accesses a bit. It is not advised to unconditionally set this flag on all images on all platforms. VkHostImageCopyDevicePerformanceQueryEXT can be used to query the implications of using the flag. In this structure, identicalMemoryLayout is a boolean that indicates whether the physical layout of the image is affected by the VK_IMAGE_USAGE_HOST_TRANSFER_BIT_EXT flag, while optimalDeviceAccess is a boolean to indicate that it is affected (if identicalMemoryLayout is false), but the impact on performance of the image on the device is negligible (as determined by each implementation). Note that optimalDeviceAccess is guaranteed to be true for block-compressed formats.
Applications can still choose to use this flag despite a loss in performance for select images, especially for images that are used infrequently. For example, a browser that samples from an image once and caches the rendered results may prefer more efficient initialization of the image over maximizing sampling performance.
To be used as the source of a copy operation, an image must be in one of the layouts indicated in VkPhysicalDeviceHostImageCopyPropertiesEXT::pCopySrcLayouts. To be used as the destination of a copy operation, an image must be in one of the layouts indicated in VkPhysicalDeviceHostImageCopyPropertiesEXT::pCopyDstLayouts. Both of these lists always include VK_IMAGE_LAYOUT_GENERAL. On some implementations, some image layouts that are too expensive to swizzle on the host may be excluded from these lists.
To support performing image copies on the host without involving the device, image layout transitions can be performed on the host using the new vkTransitionImageLayoutEXT function. Not all layouts are supported, however. The oldLayout parameter of the transition can only be VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_PREINITIALIZED, or one of the layouts in VkPhysicalDeviceHostImageCopyPropertiesEXT::pCopySrcLayouts. The newLayout parameter can only be one of the layouts in VkPhysicalDeviceHostImageCopyPropertiesEXT::pCopyDstLayouts (guaranteed to include at least VK_IMAGE_LAYOUT_GENERAL).
Used together, host copies and layout transitions allow an application to place a newly created image in the layout it is intended to be used on the device and initialize its contents, all on the host. It should be noted however that if an image is in a layout that cannot be used in a host copy operation, vkTransitionImageLayoutEXT cannot be used to transition it to a layout that can without transitioning from VK_IMAGE_LAYOUT_UNDEFINED and consequently destroying the image’s contents.
When using VK_HOST_IMAGE_COPY_MEMCPY_EXT, image data can be copied to and from the device in a format matching the physical layout of the image on the device. Copies used with this flag are expected to be as fast as memcpy to and/or from the type of memory the image is stored in would be expected to be. However, it must be noted that the physical layout of images is dependent on the specific device and driver version. As such, pre-swizzled image data is not portable. The VkPhysicalDeviceHostImageCopyPropertiesEXT::optimalTilingLayoutUUID value can be used to determine if previously retrieved pre-swizzled data is still valid for the device.
It is not guaranteed that all images can be copied on the host. As mentioned previously, the image memory must be mappable by the driver, which may limit the memory types available for the image. Additionally, certain layouts may be expensive to swizzle on the host, incurring too much performance overhead.
The following are additional restrictions to copying image data on the host:
- The VK_FORMAT_FEATURE_2_HOST_IMAGE_TRANSFER_BIT_EXT feature flag must be supported for the format. This flag is guaranteed to be supported on formats that support VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT.
- The vkGetPhysicalDeviceImageFormatProperties2 query may still fail under certain limited circumstances (such as with certain DRM format modifiers), so applications should generally be prepared to fall back to using device copy operations.
- Host image copies do not synchronize with the device. It is the responsibility of the application to ensure that device reads and writes are synchronized with host reads and writes. Simultaneous device and host reads are allowed and behave as expected.
We are excited about the optimization opportunities this extension provides, such as faster loading times, less stuttering while streaming or defragmenting assets, as well as reduced latency and power consumption.
We look forward to seeing how developers use this new functionality. If you have questions or want to discuss use cases we haven't mentioned here, start a discussion on Vulkan Discord, or add your comments below.