Please go easy on me and help me understand some things. I have read a lot of documentation but am still confused on some parts and I hope you can help break it down into simpler terms for me.
1) Does the number of work-groups affect execution in any meaningful way? Or, are they simply there to provide an optional means of simplifying a problem for the developer?
2) How does one queue an arbitrary number of workitems on a GPU? For example, say my algorithm requires me to execute 233 instances of a kernel in parallel, using the GPU. How is this typically done?
On my machine, 512 seems to be the minimum number of work-items. Would I queue up 512 instances of the kernel (workitems), and have the last 279 instances do nothing? Thanks ahead of time, and I appreciate any well-thought-out responses.