PDA

View Full Version : Specify the number of compute units for execution of kernel.



Sayantan
06-03-2011, 11:41 PM
Hello,
My GPU has 10 compute units.But using all of them together makes the system unresponsive during execution.But after execution evrything becomes normal.So I want to use 5 compute units so that my system remains responsive during execution of kernel.Therfore how do I specify the number of compute units to use??

david.garcia
06-04-2011, 04:52 AM
Try submitting less work each time so that overall the machine is still responsive. For example, instead of executing 100,000 work-groups in one call to clEnqueueNDRangeKernel(), make ten calls and run only 10,000 work-groups in each of them. The "global_work_offset" parameter of clEnqueueNDRangeKernel() (http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clEnqueueNDRangeKernel.html) should come handy.

The method above will work well even on older hardware.

Sayantan
06-04-2011, 06:24 AM
Thanks for your help.I understand what you are telling but can you give me an example how to use global_work_offset in clEnqueueNDRangeKernel().suppose I have 100 work items and I want to split them in two halves 0-49 and 50-99.Assume work_dim=1.

ilektrik
06-04-2011, 12:52 PM
You can try something like it:


int workSize = 100;
int globalWorkSize = 50;
int passes = 2; // this value is obvious in this example
size_t globalWorkSize[1] = {globalWorkSize};
size_t globalWorkOffset[1] = {0};

for(int i=0; i<passes; i++)
{
clEnqueueNDRangeKernel(GPUCommandQueue, OpenCL, 1, globalWorkOffset, globalWorkSize, NULL, 0, NULL, NULL);
// read results by clEnqueueReadBuffer() with blocking set to CL_TRUE
globalWorkOffset[0]+=globalWorkSize[1];
}

Hope code is OK ;)

Of course take a look at:
http://www.khronos.org/registry/cl/sdk/ ... ernel.html (http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clEnqueueNDRangeKernel.html)
http://www.khronos.org/registry/cl/sdk/ ... uffer.html (http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clEnqueueReadBuffer.html)

david.garcia
06-04-2011, 02:18 PM
Yeah, something like what ilektrik suggests. If I may make a couple of little changes,



int globalWorkSize = 50;
int passes = 2; // this value is obvious in this example
size_t globalWorkSize = globalWorkSize;
size_t globalWorkOffset = 0;

for(int i=0; i<passes; i++)
{
clEnqueueNDRangeKernel(GPUCommandQueue, OpenCL, 1, &globalWorkOffset, &globalWorkSize, NULL, 0, NULL, NULL);
globalWorkOffset+=globalWorkSize;
}
// Here you can read results by clEnqueueReadBuffer()
// with blocking set to CL_TRUE


Just make sure that your kernel source calls get_global_offset(0) to know which portion of the computation to execute since get_global_size(0) will now return values from 0 to 50 instead of from 0 to 100.

Sayantan
06-04-2011, 08:06 PM
Thanks a lot guys.Technique works great. :D