PDA

View Full Version : work-group to work-group direct data transfer (DMA)



activedaily
02-18-2011, 02:34 AM
There is no possibility to send data directly between work-groups, using async_work_group_copy for this is far from being optimal. My suggestion to have something like this:

event_t async_direct_work_group_copy (
wgtypen dst_work_group,
__local gentype *dst,
const __global gentype *src,
size_t num_gentypes,
event_t event);


dst_work_group - work group number in ND-Range, wgtypen, n=1,2,..,MAX_DIM

david.garcia
02-20-2011, 10:45 AM
Let's see if I understand what you are proposing. You want a work-group to write data into the local memory of another work-group?

Do you realize that work-groups execute asynchronously from each other? How do you know that the destination work-group has not already finished executing? Or what if it is in the middle of the execution?

activedaily
02-24-2011, 01:49 PM
Current OpenCL standard is too limited for different work-group
threads communications, just over global memory. As result there is a
bottleneck. It's very known problem for algorithms with heavy data
flow, the methods how to improve this also well known.

Moreover we need different kernels direct communication more effective then over global memory. For instance:

event_t async_global_direct_work_group_copy (
kerneltype dsk_kernel,
wgtypen dst_work_group,
__local gentype *dst,
const __local gentype *src,
size_t num_gentypes,
event_t event);

Syncing of different work-group threads is not covered directly in standard but it's not a problem to support this.

Local memory optimizations can give significant performance improvements if this features will be supported on hardware level.

sean.settle
03-16-2011, 10:16 AM
This is a little different but follows on the idea of local memory in work-groups running async.

From what I understand, work-items within a work-group can be swapped in and out, and similarly work-groups can be swapped in and out? When work-items are swapped, local memory and registers are left intact until they completely finish. Is this true for work-groups as well? And the local memory is non-overlapping from each work-group, even if they're running on the same compute unit?

david.garcia
03-16-2011, 11:25 AM
From what I understand, work-items within a work-group can be swapped in and out, and similarly work-groups can be swapped in and out?

The answer to both is very implementation-dependent.


And the local memory is non-overlapping from each work-group, even if they're running on the same compute unit?

The local memory of each work-group is independent from all the other work-groups.

sean.settle
03-16-2011, 11:33 AM
David, Thanks for the great answers!