PDA

View Full Version : How to index local accrays for best performance?



tmp
06-09-2010, 01:49 AM
I have a problem understanding the memory access patterns for kernels in OpenCL. Consider the below snippet of a kernel. It exists in a two-dimensional 16x16 work-group and basically the 16x16 threads collaborates on initializing a 16x16 local array (As) with elements from a larger global array.


...
int ti = get_local_id(0);
int tj = get_local_id(1);
__local int As[16][16];
As[ti][tj] = someGlobalArray[...];
barrier(CLK_LOCAL_MEM_FENCE);
...

The strange thing is that if I access As with As[tj][ti] instead of As[ti][tj] the code runs much faster. Can anyone explain why?

tmp
06-09-2010, 01:53 AM
Opps, the title should of course say "arrays", not "accrays" (whatever that is)... :-)

dominik
06-09-2010, 05:44 AM
Do you only change the indices for the access to As or do you also change the indices for your global array?

dominik
06-09-2010, 06:18 AM
If you're not changing the access to global memory then the problem could be bank-conflicts.
Using As[ti][tj] means that adjacent workitems will access the same column and therefore the same bank (because your array width is 16). With As[tj][ti] adjacent workitems will access the same row and thus different banks. This is why it is so much faster.
You can try padding your array (e.g. As[16][17] to avoid this).