I just realized that the ATI implementation does not seem to support Out of Order Execution. The device is a Radeon HD6970.
What exactly does it mean that OOOE is not supported? Assume the following operations are enqueued:
- Write to buffer A[/*:m5jdm2qq]
- Execute kernel that uses A[/*:m5jdm2qq]
- Write to buffer B[/*:m5jdm2qq]
- Execute kernel that uses B[/*:m5jdm2qq]
Given my setup, is there a chance that operation 2 and 3 run in parallel, thereby overlapping computation and communication?