What is the recommended way to convert serial loops into NDRanges. The loop count is a starting value for global_size, but global_size must be a multiple of local_size. So that either requires doing a conditional statement inside the kernel for global_id < loop count, or pad (with appropriate values) whatever array argument before passing it to the kernel so that it becomes a multiple of local_size. The second method removes the condition inside the kernel but takes time to resize array(s). So the first method seems better, but doesn't that preclude one from them using a barrier(CLK_LOCAL_MEM_FENCE)?
Basically, is it possible to do something like:
if (get_global_id(0) >= n)
I recall the OpenCL 1.1 spec saying if any work-item encounters a barrier, then all executing work-items must also encounter that barrier. But if a work-item already returned, then it's not an executing work-item, right?