ok i am just getting into openCL. and there is a lot of confusion i am having wrt to translating hardware and software groups.
i have a gtx 285 and it is having 240 cuda streaming processors. But when i run the device query program in nvidia gpu computing sdk for openCL it shows
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
1.So why is openCL showing compute units as 30 when my streaming processors are 240? I guess a compute unit is not a streaming processor? So then what is a compute unit?
2. Now the work group max size is 512, so it means it can have a max of 512 threads/wrk-items? But i can have any number of work groups of any dimension?
3. Work group is obviously the logical abstraction, so does a work group span multiple stream processors? eg a work group having more than warp size of work-items
4. what is the logic of 16kb of local memory asigned to a work group. if work group is logical and not hardware how can it get a fast 16kb local memory from a streaming processor. (this one drives me nuts)
5. can 2 cpu threads have their own kernels, doing the same task in parallel? obviously their data is local to them and so no need to synch them.
6. can i pre allocate a 1d array of size n on a kernel on program-load and use this same kernel but different instance for each cpu thread?
i know too many questions, but a noobs gotta know all this before getting hands dirty or stick to sse