The code inside my kernel has been tested in a single threaded environment (it was part of a doubly nested for loop).

The 'object' that the kernel should be operating on is a triple pointer to a struct I made (cell). It is defined as such:
Code :
cell ***cellGrid

I access it like:
Code :
cell[height][width]->value = 10;

That all works fine in the single threaded (CPU only) version.

So now with the parallel version, the kernel is giving me memory errors, which is reporting the following error: "Error info: CL_OUT_OF_RESOURCES error executing CL_COMMAND_READ_BUFFER on GeForce GTX 280 (Device 0)." - which isn't too ambiguous.

I commented out the contents of my kernel, it does not crash. I added the following line to the kernel and it crashed.
Code :
cellGrid[0][0]->value = 5;

Here is how I'm copying the data onto the GPU, reading it from the GPU, and calling the NDRange kernel:
Code :
clEnqueueWriteBuffer(queue, gpu_memory, CL_TRUE, 0, GRID_DATA_SIZE, **cellGrid, 0, NULL, NULL);
clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global_workgroup_length, local_workgroup_length, 0, NULL, &event);
clEnqueueReadBuffer(queue, gpu_memory, CL_TRUE, 0, GRID_DATA_SIZE, **cellGrid, 0, NULL, NULL);

Where:
global_workgroup_length is a 1D, 2 element array of size_t which has the values 600 and 600.
local_workgroup_length is a 1D, 2 element array of size_t which has the values 20 and 20.
GRID_DATA_SIZE = 600 * 600 * sizeof(cell)
cellGrid is a properly allocated triple pointer to a cell.
gpu_memory is a valid cl_mem object (status of CL_SUCCESS), created with the following line:
Code :
gpu_memory = clCreateBuffer(context, CL_MEM_READ_WRITE, GRID_DATA_SIZE, NULL, &status);

I feel the issue has something to do with me copying the data over to the GPU (or reading from) but I don't see how that would matter. Though it's a triple pointer, if I de-reference twice, then it should be copying the actual cell structs, which is what I want. This comes into the kernel as a pointer to a cell struct, but I immediately cast it to a triple pointer to a cell struct, via:
Code :
pixel ***cellGrid = (cell***)_cellGrid;

I'm still questioning the legality of how I'm copying and retrieving from the GPU. I feel that as long as I'm staying in the bounds of the height and width (both 600), as the memory is contigious, I should be fine...though I still feel as this can be a problematic issue. Is what I doing legit? Are there other problems you see?

On one hand I feel as it's contiguous it should work...on the other hand I feel as though the "intermediate" pointers may have been lost in translation. Newb C fail?

Any help is appreciated.

EDIT:
I forgot to mention, the max workgroup size is 512, my local work group size in both dimensions is 20, so 400 < 512.

EDIT 2:
I'm hoping to access my 2D grid inside the kernel via:
Code :
int x = get_global_id(0);
int y = get_global_id(1);
cellgrid[x][y]->value = 500;

Though right now it won't even work with 0's for both indicies..