Description

The OpenCL C programming language implements the following functions that provide asynchronous copies between global and local memory and a prefetch from global memory.

We use the generic type name gentype to indicate the built-in data types char, char{2|341|4|8|16}, uchar, uchar{2|3|4|8|16}, short, short{2|3|4|8|16}, ushort, ushort{2|3|4|8|16}, int, int{2|3|4|8|16}, uint, uint{2|3|4|8|16}, long, long{2|3|4|8|16}, ulong, ulong{2|3|4|8|16}, float, float{2|3|4|8|16}, or double, double{2|3|4|8|16} as the type for the arguments unless otherwise stated.

[41] async_work_group_copy and async_work_group_strided_copy for 3-component vector types behave as async_work_group_copy and async_work_group_strided_copy respectively for 4-component vector types.

Table 1. Built-in Async Copy and Prefetch Functions

Function

Description

event_t async_work_group_copy(local gentype *dst, const global gentype *src, size_t num_gentypes, event_t event)
event_t async_work_group_copy(global gentype *dst, const local gentype *src, size_t num_gentypes, event_t event)

Perform an async copy of num_gentypes gentype elements from src to dst. The async copy is performed by all work-items in a work-group and this built-in function must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined. This rule applies to ND-ranges implemented with uniform and non-uniform work-groups.

Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero.

0 can be implicitly and explicitly cast to event_t type.

If event argument is non-zero, the event object supplied in event argument will be returned.

This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy.

event_t async_work_group_strided_copy(local gentype *dst, const global gentype *src, size_t num_gentypes, size_t src_stride, event_t event)
event_t async_work_group_strided_copy(global gentype *dst, const local gentype *src, size_t num_gentypes, size_t dst_stride, event_t event)

Perform an async gather of num_gentypes gentype elements from src to dst. The src_stride is the stride in elements for each gentype element read from src. The dst_stride is the stride in elements for each gentype element written to dst. The async gather is performed by all work-items in a work-group. This built-in function must therefore be encountered by all work-items in a work-group executing the kernel with the same argument values; otherwise the results are undefined. This rule applies to ND-ranges implemented with uniform and non-uniform work-groups

Returns an event object that can be used by wait_group_events to wait for the async copy to finish. The event argument can also be used to associate the async_work_group_strided_copy with a previous async copy allowing an event to be shared by multiple async copies; otherwise event should be zero.

0 can be implicitly and explicitly cast to event_t type.

If event argument is non-zero, the event object supplied in event argument will be returned.

This function does not perform any implicit synchronization of source data such as using a barrier before performing the copy.

The behavior of async_work_group_strided_copy is undefined if src_stride or dst_stride is 0, or if the src_stride or dst_stride values cause the src or dst pointers to exceed the upper bounds of the address space during the copy.

void wait_group_events(int num_events, event_t *event_list)

Wait for events that identify the async_work_group_copy operations to complete. The event objects specified in event_list will be released after the wait is performed.

This function must be encountered by all work-items in a work-group executing the kernel with the same num_events and event objects specified in event_list; otherwise the results are undefined. This rule applies to ND-ranges implemented with uniform and non-uniform work-groups

void prefetch(const _global gentype *_p, size_t num_gentypes)

Prefetch num_gentypes * sizeof(gentype) bytes into the global cache. The prefetch instruction is applied to a work-item in a work-group and does not affect the functional behavior of the kernel.

The kernel must wait for the completion of all async copies using the wait_group_events built-in function before exiting; otherwise the behavior is undefined.

See Also

No cross-references are available

Document Notes

For more information, see the OpenCL C Specification

This page is extracted from the OpenCL C Specification. Fixes and changes should be made to the Specification, not directly.

Copyright (c) 2014-2020 Khronos Group. This work is licensed under a Creative Commons Attribution 4.0 International License.