PDA

View Full Version : specify work group sizes



duanmu
10-21-2009, 10:49 AM
One can specify work group sizes with kernel attribute reqd_work_group_size and/or with the local_work_size argument of clEnqueueNDRangeKernel(). If reqd_work_group_size is specified, then local_work_size must match it. I wonder what the rationale is behind it. Why can not local_work_size be NULL?

affie
10-21-2009, 12:49 PM
The intent is that if reqd_work_group_size is specified, then you cannot pass a different local_work_size value. NULL is OK as it will use the values specified with reqd_work_group_size. You cannot specify a non-NULL local_work_size which with values that are not the same as those specified with reqd_work_group_size.

duanmu
10-21-2009, 05:03 PM
One can specify work group sizes with kernel attribute reqd_work_group_size and/or with the local_work_size argument of clEnqueueNDRangeKernel(). If reqd_work_group_size is specified, then local_work_size must match it. I wonder what the rationale is behind it. Why can not local_work_size be NULL?

Actually the NULL value won't work, at least with NVIDIA'S implementation, which is also implied by the OpenCL specification 1.0.

affie
10-21-2009, 06:29 PM
I need to correct my previous post. A NULL value cannot be used when reqd_work_group_size is specified. local_work_size must be the values specified with reqd_work_group_size. The reason for this is that the global_work_size must be a multiple of the local_work_size. In this case it must be a multiple of the reqd_work_group_size. The OpenCL implementation may have optimized the program executable (and kernel code) for this work-group size. Therefore a NULL value or any other value other than reqd_work_group_size cannot be used.

duanmu
10-22-2009, 07:47 AM
One can specify work group sizes with kernel attribute reqd_work_group_size and/or with the local_work_size argument of clEnqueueNDRangeKernel(). If reqd_work_group_size is specified, then local_work_size must match it. I wonder what the rationale is behind it. Why can not local_work_size be NULL?

It makes some sense. Alternatively, the standard could allow the kernel to be queued even with local_work_size=NULL as long as global_work_size is a multiple of reqd_work_group_size and makes clEnqueueNDRangeKernel() fail otherwise.

yoavhacohen
01-23-2012, 04:55 PM
Agree with duanmu.