PDA

View Full Version : CL_INVALID_WORK_GROUP_SIZE on MacOS 10.6.2 with ATI



luarwick
11-19-2009, 07:56 AM
I have MacOS 10.6.2 (Snow L) and ATI Radeon HD4870. For this card, CL_DEVICE_MAX_WORK_GROUP_SIZE=1024. However, for some reason I cannot use any work group sizes larger than 256, for example if I try to use 512x1x1, clEnqueueNDRangeKernel reports CL_INVALID_WORK_GROUP_SIZE. Any ideas why this can be happening? Can it be something inherent to the ATI Stream?

Here's my code (error handling stripped, since no errors are generated in the middle):



clGetPlatformIDs(max_num_platforms, platforms, &num_platforms);
clGetDeviceIDs( platforms[0], CL_DEVICE_TYPE_GPU, num_entries, devices, &num_devices);
device = devices[0];
cmd_queue = clCreateCommandQueue(context, device, 0, NULL);
program = clCreateProgramWithSource( context, 1, &kernel_str, NULL, NULL);
clBuildProgram(program, 1, &device, NULL, NULL, NULL);
kernel = clCreateKernel(program, "inc", NULL);
cl_mem memobj = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, sizeof(float)*n, srcA, NULL);
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&memobj );

size_t gws = 512, lws = 512;
err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, &gws, &lws, 0, NULL, NULL);
assert(err==CL_SUCCESS);


A kernel is very simple:


__kernel void inc (__global const double *a) {
int k = get_global_id(0);
++a[k];
}

affie
11-19-2009, 08:26 AM
You should use the work-group size value returned by clGetKernelWorkGroupInfo(kernel, CL_KERNEL_WORK_GROUP_SIZE, ...). Can you check what this value is?

Note that CL_DEVICE_MAX_WORK_GROUP_SIZE is the max. workgroup size that can be used on device but the max. work-group size value can vary from kernel to kernel depending on resources used by the kernel. You should always use the value returned by clGetKernelWorkGroupInfo.

luarwick
11-19-2009, 08:47 AM
Thank you, this explains everything. It reports max available work group size = 256 for my kernel.

jonathan81
11-19-2009, 09:46 AM
Hi,
with a ATI RADEON 5870 i have the same problem (on NT)
MAX WORK ITEM = 1024 but it works only with max size = 256
I thinks it's just a bug in the AMD driver for the moment.