for loops in kernel
According to the book "OpenCL in action":
"Comparisons are time-consumingon the best of processors, but they’re especially slow on dedicated number-crunchers like graphic processor units (GPUs). GPUs excel at performing the same operations over and over again, but they’re not good at making decisions. If a GPU has to check a condition and branch, it may take hundreds of cycles before it can get back to crunching numbers at full speed."
But in this great book there are few samples where the kernel contains 'for' loops:
matrix transposition: page 261
matrix multiplication: page 264
DFT: page 314
My question is: Is it possible to avoid 'for' and 'while' loops in kernel functions ?
And another one: Let's say I have only 5 work groups. It means that I need 5 cores.
Am I right ?
The book puts this poorly.
GPUs are quick at "making decisions". The issue with branching is that if you have *divergent* branching within a work group, it will cause a performance penalty because both sides of the branch are taken by every work item.
"For" loops that are fixed counts do not fall into the category. However, any control flow that is dependent on the global_id or data you read could cause divergent branching.
It is something to be aware of but doesn't have to be avoided at all costs.
Originally Posted by Dithermaster
Thank you for your help !