With Nvidia's current OpenCL release, the clEnqueueNDRangeKernel function blocks the calling process until the kernel has completed. The OpenCL specification is not explicit about the blocking/non-blocking behavior of clEnqueueNDRangeKernel, but I think a non-blocking version would be much more useful. So my question is if the clEnqueueNDRangeKernel function is meant to be non-blocking (in which case the Nvidia implementation is buggy) or if the implementor may choose blocking or non-blocking behavior (in which case the Nvidia implementation is less useful, but valid). Can somebody involved in the OpenCL specification please comment on that?
Thanks & kind regards,