PDA

View Full Version : clEnqueueNDRangeKernel max global_work_size



matrem
12-03-2009, 02:33 AM
Is there another maximum than 2^address_bits for the global_size?

I try to pass a bit more than 100 000 000 and clEnqueueNDRangeKernel return INVALID_VALUE.

I work under vista64 with last nVidia drivers.

dbs2
12-03-2009, 03:09 AM
No. The spec puts no limits on the size except for the size of the type holding the data, which is a size_t. The OpenCL implementation is responsible for breaking up your requested global size into something that will work on the hardware. So if your size is not being accepted it sounds like a bug with the Nvidia implementation.

I would suggest you make sure that your global size is a multiple of 32, though. Otherwise the driver may be forced to use a local size that is non-optimal for the card and you will get substantially worse performance.

matrem
12-03-2009, 03:23 AM
Thanks.

To be accurate I use (134217728,1,1) for the global and (512,1,1) for he local.

Without any nVidia account, is it possible to report OpenCL driver bugs somewhere? Perhaps nVidia guys read this forum?

dbs2
12-03-2009, 04:50 AM
Did you verify via clGetKernelWorkgroupInfo that that local size is valid for your kernel? Since that's the maximum the hardware supports, it will only be okay if your kernel is using very few registers. (Or you can just pass in NULL for the local size.)

matrem
12-03-2009, 07:50 AM
Very interesting, actually I put 512, the max of my device capability. But now I've tried to let the implementation choose the right local size and the problem is the same.

If we use to much register, should the return not be CL_OUT_OF_RESOURCES ?

dbs2
12-03-2009, 11:20 AM
You have to put in a number for the total local workgroup size that is less than or equal to what is returned by the clGetKernelWorkgroupInfo call. I believe you should get CL_INVALID_WORK_GROUP_SIZE if the size is not valid.

matrem
12-03-2009, 11:38 AM
Thank you very much, I forgot this "clGetKernelWorkgroupInfo" call... It's just essential.

matrem
12-15-2009, 02:56 AM
In fact I have an invalid_value as soon as the ratio is over 65535 ... I guess this ratio is stored by drivers in a too short variable :)...

dbs2
12-15-2009, 05:56 AM
That sounds like a hardware limit that the Nvidia driver is not handling correctly. I would suggest filing a bug against them if you can.

matrem
12-15-2009, 07:47 AM
I can't with no account (I just retried to create one...).

195.181 nVidia beta drivers don't fix the problem.

Can it be possible to create a zone on this forum to report implementation bugs, that manufacturers could read?

matrem
12-17-2009, 08:51 AM
In NVIDIA OpenCL guide we can read :

The maximum size of each dimension of a grid of thread blocks is 65535;
So there is a limitation, fixed by nVidia, which is'nt define by specification and that we can't query with API... This constraint limits a bit the portability :|

dbs2
01-04-2010, 01:19 AM
It is strange that you can use arbitrary sizes on Nvidia cards on MacOS X but not with Nvidia's own drivers. All they would have to do is breakup the runtime kernel execution into 65k sized chunks since each execution is necessarily independent.