PDA

View Full Version : how can I load arrays to constant memory space



noah_r
01-23-2013, 04:36 PM
I'm confused about how to load read-only array values into the device constant memory space versus global memory space. The moderately sized array nodes_x in the below kernel meets all the criteria such that some hardware should perform better if the array is loaded to constant memory instead of global memory.


__kernel void sequenceKernel_00( constant const float * nodes_x, global float * const restrict density_n)
{ /* etc. */ }

My code already works with the first nodes_x argument being declared global. I have been passing the flag CL_MEM_READ_ONLY to clCreateBuffer().
I'm surprised that clCreateBuffer() does not take an additional argument that specifies the desired memory space.

So, it seems insufficient to me, that all I have to change to move nodes_x array into constant memory is the kernel argument declaration from global to constant.

As an experiment, I changed the flag for my nodes_x buffer creation to CL_MEM_WRITE_ONLY. Nevertheless, clSetKernelArg() still reports CL_SUCCESS when setting the constant nodes_x argument to that buffer. This is clearly wrong and makes me think I'm missing something.

Since cl_buffers are created independent of a particular kernel, and clEnqueueWriteBuffer() can be called before clSetKernelArg(), I don't see how any late-binding can be in play

(running Mac OS X 10.8 and an NVIDIA Tesla GPU)

matthiasv
01-24-2013, 12:47 AM
So, it seems insufficient to me, that all I have to change to move nodes_x array into constant memory is the kernel argument declaration from global to constant.

But it is. When the kernel is executed and the memory object is larger than CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, you will get an error. Moreover, if you specify CL_MEM_WRITE_ONLY you'll get "undefined behaviour" as per section 5.2.1 of the OpenCL 1.1 specification.