Comparing the Vulkan SPIR-V memory model to C++‘s

The Vulkan/SPIR-V memory model was built on the foundation of the C++ memory model, but ended up diverging in a number of places.

A lot of how GPU programming models work across modern graphics APIs has evolved through years of development, reflecting the markets that those APIs have targeted. Naturally, the Vulkan/SPIR-V memory model has made several decisions that reflect this. We added several new facets to the model, including scopes, storage classes, and memory availability and visibility operations to name some of the more prominent ones.

However, It is not a strict superset either, and there are a few places where some features have been omitted for similar reasons. For example, sequential consistency is not supported, and forward progress guarantees are limited.

This post aims to give a high-level overview of the differences, explaining what the differences are, why they are different, and how (if at all) C++ concepts can map to the Vulkan/SPIR-V memory model. It is aimed primarily at people already familiar with the C++ memory model who either want to get some insight into what the differences are or those who are curious about why we took the direction we did.

The Differences

Each subsection here describes what the difference is, why the difference exists in the Justification, and gives an overview of how C++ concepts could be mapped to SPIR-V in Mapping C++ to SPIR-V. This blog lists the fundamental differences at the top, with things removed from the C++ memory model being called out at the end.

Memory Locations and References

C++ defines a single variable (or a whole bitfield) as a memory location - there’s no real concept of allocating memory separately from having that memory available as a usable variable.

The Vulkan memory model defines two concepts instead of one:

Memory locations

Represent the underlying memory allocated by an implementation.

References

Variables or handles which the underlying memory can be accessed through.

Justification

In Vulkan, memory allocations have completely separate lifetimes from the resources or variables, which can be used to read and write values from/to those memory allocations.

Mapping C++ to SPIR-V

C++ variables should be considered as both a reference and a memory location in this memory model, rather than just a memory location as in the C++ memory model. A pointer or C++ reference (&) is then an additional reference to that same memory location, with pointers themselves having a memory location and reference to that new memory location.

Availability and Visibility

The largest difference between the C++ memory model and that of Vulkan/SPIR-V is the inclusion of availability and visibility operations.

Availability operations ensure that values written to a memory location in one thread can be made available to other threads. Visibility operations guarantee that values which are available are made visible to a thread, to ensure that the correct values are read. For typical devices supporting Vulkan, availability and visibility operations will map to cache control (e.g. flushes/invalidates/bypass).

To avoid a data race, the following operations are needed for each type of hazard, in addition to the usual synchronization required in the C++ memory model:

Read-after-Write

Requires the write to be first made available and then visible before the read.

Write-after-Write

Requires the first write to be made available before the second write.

Write-after-Read

Does not require availability or visibility operations.

Read-after-Read

Not a hazard.

The memory model includes the ability to express availability and visibility operations either for specific memory accesses at the point they are performed, or for a set of accesses at once using a synchronization operation such as a memory barrier. The former is intended as a directive for an implementation to bypass a cache or use cache coherence, and should typically be used when only a handful of small accesses are performed. The latter option is intended as a directive to instead perform bulk cache flushes/invalidates, which may be better when a large number of accesses to neighboring memory location are performed in the same thread.

Availability and visibility operations are ordered by release and acquire semantics in the same way as stores and loads.

Justification

C++ evolved on systems with coherent CPU caches, and thus it is natural for cache maintenance operations not to be explicitly exposed. When C++ is compiled to a system with noncoherent caches, the appropriate cache maintenance operations can be folded into release and acquire operations.

Vulkan and SPIR-V evolved on systems with many different execution units where each execution unit’s cache may not be coherent with any others, and thus exposing cache maintenance is a natural part of the API/language and can be a source of optimizations.

Without explicit availability and visibility operations, hazards that don’t need certain cache maintenance operations (e.g. Write-after-Write needs none) would still be bound to perform them, which would incur a performance penalty.

Mapping C++ to SPIR-V

As C++ does not have availability and visibility operations, the simplest way to handle these when translating from C++ to Vulkan SPIR-V is to add the MakeAvailable and MakeVisible automatically to synchronization instructions based on the memory ordering, as follows:

C++ Memory Order SPIR-V Memory Semantics

memory_order_relaxed

None

memory_order_consume

Promote to memory_order_acquire1

memory_order_acquire

MakeVisible | Acquire

memory_order_release

MakeAvailable | Release

memory_order_acq_rel

MakeAvailable | MakeVisible | AcquireRelease

memory_order_seq_cst

No mapping available 2

1

There are no 'consume' semantics in SPIR-V, but it is safe to promote these to memory_order_acquire for the purposes of mapping. A C++ translator may be able to move memory operations around to reduce the number of memory operations unnecessarily synchronized by this promotion.

2

See No "Sequentially Consistent" Semantics

As SPIR-V presents two options for where to mark availability and visibility operations, a C++ compiler could also choose to attach availability and visibility semantics to stores and loads, respectively, as described above - however, the expectation is that bulk operations will generally be more suitable for typical C++ code.

Note
If a C++ translator reuses an externally allocated memory location (e.g. a buffer object) for another object, care should be taken to ensure availability and visibility operations are performed appropriately to avoid data races caused by such reuse.

Aliasing

Two or more references that refer to the same underlying memory location are said to alias. If references are known to alias, then the compiler will not reorder accesses via those references against each other.

By default, references are assumed to not alias. References can be explicitly marked as such with the Aliased keyword, where all references decorated with Aliased are treated as if they alias each other. Additionally, any reference created within a function from another reference is known to alias with the original reference for the duration of that function.

For descriptor references (e.g. buffer or image resources) where any aliasing between them is described wholly outside of the SPIR-V program (i.e. they are different descriptors), writes made to one alias cannot be made available or visible to the other alias, within that SPIR-V program. However, accesses are still ordered in the same way, such that WAR hazards can still be mitigated with the appropriate use of the Aliased keyword and Acquire/Release semantics.

Justification

Aliasing is not assumed by default, as shaders are usually very much part of the critical path, and have typically been written assuming no aliasing.

Descriptors which are references to overlapping memory locations do not necessarily share a common virtual address, and so for devices with virtually-tagged cache hierarchies, coherence between accesses to different descriptors would require flushing out to RAM, which would be potentially impossible on some architectures, and at best be a performance regression compared to other similar APIs for others.

Mapping C++ to SPIR-V

Memory accesses should typically be decorated as aliasing if a C++ to SPIR-V translator also believes them to possibly alias. Such translators will need to be careful handling externally defined aliases such that external aliasing is avoided altogether if this nuance is not exposed to the application directly in some way.

Distinct Storage Classes and Inter-Thread Happens Before

Vulkan defines multiple different ways to access memory in a shader such as buffers and images - described as resources and descriptors in the Vulkan specification.

SPIR-V denotes these different accesses as Storage Classes, and synchronization instructions define a list of Storage Classes in their Semantics on which they operate. Memory accesses are not affected by barriers that don’t specify their storage class in its semantics, in the same way, that Private Memory Access is not affected by any barriers.

However, as some barriers may affect accesses to some storage classes and not others, the memory model needed a way to describe the interaction of these barriers. This leads to a "templated" definition of inter-thread happens before, where a set of storage classes is a part of that definition and inter-thread happens before relations with different sets of storage classes do not interact to form happens before relations.

OpenCL faced a similar problem and ended up with disjoint local-happens-before, global-happens-before, and disjoint image access. The templated inter-thread happens before relation handles the same issue, but is able to express ordering between operations in different storage classes.

Justification

C++ allocations are largely homogenous - a pointer to a variable doesn’t change how that variable is accessed in any way compared to accessing it via the original variable.

Due to the methods of access being different for each type of resource, graphics devices have historically been created with unique cache hierarchies where different storage classes have their own caches and access hardware. Graphics and GPU-centric compute APIs have typically made the same distinction, leading to these being called out in some way in the programming languages used for these devices.

Mapping C++ to SPIR-V

As C++ doesn’t actively have most of the concepts exposed by SPIR-V for the different resource types, only the use of resources in the StorageBuffer storage class needs to be directly supported for host-device communications, avoiding this problem.

Memory and Execution Scopes

SPIR-V defines the Scope enumeration, which corresponds to several nested groupings of threads from a small subset of threads (Subgroup) up to all threads executing across the device (Device).

All synchronization instructions (atomics, memory barriers, control barriers) include a Scope operand defining how "far" from the executing thread any synchronization should reach.

For instance, a store/release with make available semantics with the Subgroup scope cannot be made visible to a load/acquire in a different subgroup, no matter the Scope of that load/acquire.

Each Scope corresponds to a different, nested memory domain.

Justification

Vulkan targets devices which are typically "massively parallel" in that they have the ability to handle an enormous number of concurrent threads at the same time.

The execution units driving these are typically partitioned hierarchically, such that smaller groups of threads can communicate more efficiently than larger groups. Performing synchronization operations across smaller groups of threads is thus desirable to achieve greater performance.

Mapping C++ to SPIR-V

C++ does not have scopes, so by default compilation should elect to use the QueueFamily (or Device if supported) Scope everywhere. However, note that this is likely to have significant performance implications, and if the use of smaller scopes can be deduced as safe at compile time, they should be preferred.

Memory Domains

Availability and Visibility operations only operate within a single memory domain - in order to communicate data between memory domains, a memory domain operation is required, which guarantees that values which are available in one memory domain are made available to another memory domain.

Similar to Availability and Visibility operations, this is an explicit cache management concept, where each memory domain maps to a different parts of a cache hierarchy used by a Vulkan implementation. Memory domain operations perform cache maintenance as appropriate to ensure data is correctly accessible across these cache hierarchies.

Host and device are two of the key memory domains in Vulkan, with additional sets of shader memory domains for each shader type, corresponding to the various Memory and Execution Scopes. Writes can be made available to the different memory domains by the use of API barriers or shader barriers with various scopes.

Justification

Processors such as GPUs may contain several different execution units with complex cache hierarchies. Memory domain operations, along with Availability and Visibility operations give an implementation the information it needs to properly manage these caches.

Mapping C++ to SPIR-V

If only compute shaders are being considered (typical for compiling C++ code to a GPU), external memory domains can be ignored, focusing only on the memory domains provided by the different Memory and Execution Scopes.

Explicit memory domain operations are primarily needed when sharing data between CPU threads and shader invocations, and thus where they are needed entirely depends on how work is distributed.

No "Sequentially Consistent" Semantics

Sequentially consistent semantics are not included in the Vulkan memory model.

Justification

The sequentially consistent semantics in C++ place strict ordering guarantees on cache operations which would run somewhat counter to the desire for Vulkan to have explicit cache management and gets particularly difficult to guarantee in the face of multiple distinct Memory Domains. GPUs are often designed assuming fairly weak memory ordering and given that sequential consistency has shown problems on some CPU architectures (POWER) it was reasonable to assume there would be similar problems on GPUs.

Additionally, we had no legacy of support for sequential consistency, so it didn’t seem to offer an obvious benefit to adding it now. That other languages (e.g. C++ AMP) have chosen to drop sequential consistency, added to our skepticism of including the feature.

Finally, it was not obvious how to reconcile sequential consistency with some of the other generalizations we’ve made to the model.

Given all of this, the Vulkan Memory Model group elected to not include this somewhat controversial semantic, as it was concluded it would place unnecessary burden on implementations for little practical benefit.

Mapping C++ to SPIR-V

There is no trivial replacement for sequentially consistent in Vulkan/SPIR-V.

In many cases, it may be sufficient to downgrade these semantics to AcquireRelease, but it’s not a 1:1 match, so applications likely need to opt-in to this at compile time.

Atomic Operations vs Atomic Objects

C++ atomics are exposed via dedicated atomic objects - which somewhat simplifies the memory model; there’s no interaction between atomic and non-atomic accesses to the same memory locations. SPIR-V’s instead has atomic operations which can be performed on regular variables.

The actual effect on the wider memory model is minimal but does mean that atomics must in some cases be synchronized as if they were non-atomic accesses. This leads to two differences in the model:

  1. Atomics accessing the same memory location are only mutually ordered if they are in the same scope and made through the same reference.

    • If two atomics are not in each other’s Scopes, they must be ordered as if they were non-atomic operations

    • This ordering is called Scoped Modification Order in place of C++'s Total Modification Order

  2. Accesses performed by atomics are supersets of regular accesses, with the following differences

    • Availability and visibility operations are performed automatically for the memory locations accessed by the atomic, for stores and loads respectively.

    • Atomic accesses and their availability and visibility operations are executed "atomically" as a part of the atomic instruction - i.e. do not contribute to data races.

Justification

The evolution of the language has led to this situation, we’ve just defined how it works.

Mapping C++ to SPIR-V

As C++ does not expose aliasing of atomic and non-atomic objects, no mapping is really required - only use atomic operations on atomic objects, and regular operations on regular objects.

Private Memory Access

All accesses to a variable in C++ are assumed to participate in inter-thread ordering relations.

In the Vulkan memory model, we have introduced the concept of "private" vs "non-private" accesses, where only the result of non-private accesses are guaranteed to affect or be affected by accesses in other threads. Private accesses are unaffected by acquire and release semantics.

These accesses are typical for things which are read-only or written only once at the end of the computation. They are in effect only ordered against the computations that access their values, and the start or end of the program. They will eventually be synchronized at the API level.

Justification

It’s often the case that resources being read or written need to be passed between different SPIR-V program instantiations, and don’t actually need to be synchronized within a single invocation. As such, private memory access has no need to interact with barriers, and expressing this to an implementation allows a common and sometimes substantial speedup by reordering these accesses.

Mapping C++ to SPIR-V

As C++ doesn’t have this concept, generally loads and stores should be decorated with the NonPrivate semantic. The only exception is in the definition of const memory, where the semantic could be avoided if there is no const casting, and there exists no mutable aliases.

Program Order

The Vulkan/SPIR-V memory model defines a simple term 'program order', rather than the more complex 'sequenced before' in C++. Roughly, this is the order in which instructions are specified in a SPIR-V program.

Justification

SPIR-V is an IR where each basic block is a simple ordered list of instructions, rather than a more complex higher level language. This leads to a simple and concise definition of 'program order', which stands in contrast to C++'s more complicated 'sequenced before' order.

Mapping C++ to SPIR-V

While in C++ some evaluations are indeterminately sequenced, any two SPIR-V instructions in an execution are related by program order. When compiling C++ to SPIR-V, it is left to the C++ compiler to choose an ordering for indeterminately sequenced evaluations.

Control Barriers

One additional feature offered by the Vulkan memory model is the inclusion of control barriers. Control barriers ensure that all threads in the specified scope have reached the same point of execution. Use of control barriers is the only well-defined method to do this within a SPIR-V program.

Justification

Given the Limited Forward Progress Guarantees of Vulkan SPIR-V, this is a fairly essential mechanism as the only way to safely block threads for inter-thread communication.

Mapping C++ to SPIR-V

There are no direct mappings to C++ for control barriers

Limited Forward Progress Guarantees

There is no specification or even recommendation that all unblocked threads, within a single execution unit, will eventually make forward progress when one or more threads are blocked on a user-defined condition (e.g. spin-locks). Control Barriers are the only well-defined mechanism available in SPIR-V to synchronize multiple threads.

The Vulkan API as a whole does make forward progress guarantees, however, which allows applications to perform coarse synchronization at the API level to drive the desired ordering.

Justification

Current GPU hardware does not have these scheduling guarantees.

Mapping C++ to SPIR-V

C++ only recommends this kind of forward progress and does not require it - thus no handling is strictly necessary. However, on generally available platforms supporting C++, these guarantees are made and thus assumed to be true - so this is likely to surprise programmers used to multi-threaded CPU coding.

C++ compilers targeting SPIR-V should ensure that it is made extremely obvious that user-defined blocking operations are undefined behavior. Library implementations of synchronization functions can make use of API level synchronization to ensure the required forward progress at the cost of efficiency.

Summary

Whilst the list of differences isn’t particularly short, and this list isn’t exhaustive, overall it’s not an enormous divergence. If you’ve read this with a C++ memory model background, hopefully, this post has helped you answer some questions, and made the Vulkan/SPIR-V memory model a bit more accessible to you!

Khronos, EGL, glTF, NNEF, OpenVG, OpenVX, OpenXR, SPIR, SPIR-V, SYCL, Vulkan and WebGL are trademarks or registered trademarks of The Khronos Group Inc. OpenCL is a trademark of Apple Inc. and OpenGL and OpenML are registered trademarks and the OpenGL ES and OpenGL SC logos are trademarks of Hewlett Packard Enterprise used under license by Khronos. All other product names, trademarks, and/or company names are used solely for identification and belong to their respective owners.
Posted by

Comments