In order to have coalesced access to global memory, memory addresses must increase sequentially across the work-items in the wavefront and start on a 128-byte alignment boundary.

my very newbie questions are: how the buffers created with clCreateBuffer are aligned (and in general every argument to a kernel function)? it depends also from the flags we choose during the creation? there's some way to check if global memory access are coalesced on an amd platform?

Thanks