Hi all,
I'm relatively new to OpenCL (but experienced with CUDA - porting an application currently, still on nvidia 285 device) and to these forums but I've come across an area not addressed by any of the documentation I have (opencl spec, reference card, reference pages, nvidia ocl programming guide, jumpstart guide etc etc) - or at least I haven't found the 1 line that does apply in the 300 page spec...

In general I'm looking for information regarding running a kernel repeatedly. Ultimately the input data (128MB buffer of raw data) will vary but currently contains zeros - bar a few values to make sure the kernel was reading it properly.
The program flow I'm after is something like:
Code :
/*General Initialisation for first pass*/
 
//setup opencl context command queue kernel program etc...
initCL(&clCommandQueue, &clContext, &clKernel);
 
//initialise input data - hostInputData has been allocated as 'new cl_uchar[128MB];' and memset/populated with data
cl_mem devInput = clCreateBuffer(clContext, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, 128MB*sizeof(cl_uchar), hostInputData, &errCode);
 
//initialise results memory
cl_mem devResults = clCreateBuffer(clContext, CL_MEM_WRITE_ONLY, resultsSize, NULL, &errCode);
 
size_t globalWorkSize = 262144;
size_t localWorkSize = 512;
 
/* Main execution loop */
 
for (u_int32_t i = 0; i < loopCount; i++)
{
 
//write input data - for first pass this has already been done, but technicality in this instance
//ultimately I will employ double buffering with async copies to hide these transactions but in the spirit of simplicity...
clEnqueueWriteBuffer(clCommandQueue, devInput, CL_TRUE, 0, 128MB, hostInputData, 0, NULL, NULL);
 
//set kernel args
clSetKernelArg(clKernel, 0, sizeof(cl_mem), &devInput);
clSetKernelArg(clKernel, 1, sizeof(cl_mem), &devResults);
 
//run kernel
clEnqueueNDRangeKernel(clCommandQueue, clKernel, 1, 0, &globalWorkSize, &localWorkSize, 0, NULL, NULL);
 
//get results - hostResults is allocated during initialisation as 'new cl_uint[resultsSize];'
clEnqueueReadBuffer(clCommandQueue, hostResults, CL_TRUE, 0, resultsSize, devResults, 0, NULL, NULL);
 
}
Note: This is only psuedo-code representation

(In the background to this openCL work a second input host buffer will be populated with data and these alternated in the clEnqueueWriteBuffer instruction)

I don't know of any logical reason why something like the above wouldn't work however in my test example the first loop shows no problems but the second causes segmentation fault immediately after clEnqueueNDRangeKernel - all pointers and memory have been verified correct and unchanging via %p printf's.

Any general information on re-running a kernel like this or specific gotchas that may occur in a similar scenario would be gratefully received.

Thanks in advance.