For the code I'm trying to write, I plan on having each workgroup manipulate a set of 2D arrays. What is generally considered the easiest way to code this?

Right now, when I'm trying is flattening a 3D matrix, then addressing using some scheme like: matrix[(numGroups*localSize*wgroup)+localSize*x+y]

I'd like to copy each matrix in a local matrix per workGroup work on them from there.

Am I headed in the right direction? Are there any good examples of how to do this?

Here's some example host code:

Code :size_t global_worksize=N*N*computeUnits; size_t local_worksize=N*N; error=clEnqueueNDRangeKernel(cq, k_matTest, 1, NULL, &global_worksize, &local_worksize, 0, NULL, NULL); error=clFinish(cq); error=clEnqueueReadBuffer(cq, mem, CL_FALSE, 0, global_worksize*sizeof(int), matrix, 0, NULL, NULL);

N is the size of the matrix (it's square) and computeUnits are the number of work groups I want (for now, it's CL_DEVICE_MAX_COMPUTE_UNITS)

and my kernel code is:

Code :__kernel void matrixTest( __global int *matrix) { const size_t numGroups = get_num_groups(0); const size_t localSize = get_local_size(0); size_t wgroup = get_group_id(0); size_t x = get_local_id(0); size_t y = 0; matrix[(numGroups*localSize*wgroup)+localSize*x+y]++; }

I initialize the matrix to all zeros. I then expect each matrix to have a "1" in the first column of each row... but I don't get that. In my kernel, I'm expecting localSize=N - is that correct?

I'm really knew to this, so I apologize if it's totally wrong.

Thanks!