Is anyone using this function? Or understand what exactly does it do?
I have a matrix in global memory:
Code :ooooooooooo ooooooooooo ooooXXXXXoo ooooXXXXXoo ooooXXXXXoo ooooooooooo
And I need to put subregion of the matrix into the local memory:
Code :XXXXX XXXXX XXXXX
For the time being I manually calculate how many element copy operations each work-item within a workgroup should do. The code is not very simple and it will become much more complex as the dimension count of "matrix" become variable (more than 2).
But I know the initial offset, the number of continues regions I need to copy and the "distance" in global buffer between these regions. May I somehow use async_work_group_strided_copy function efficiently here instea? ?? manual calculations?