I don't understand how do the opencl kernel function to perform.
I want to calculate something in opencl kernel. So I use 256 work-item and perform it and I want to gain final result.
Is clEnqueueDNRangeKernel() function perform all of the 256 work-item?
Or am I using for loop?? ( If I use the for loop, I also write 16*16 number repeat?)