I'm a bit lost to know how much of work-items i have to put to run my application.
Cause, the concept of processing elements is different in AMD and Nvidia GPUs (Cuda Core and Stream Cores).

In a GTX 560 specifications says that have 336 Cuda Cores, this means that i have to put 336 work-items to have a full possibility and power of paralelization, of this GPU?