Hi, I was wondering if there is any way to get how many max processors/stream processors/threads inside a single compute unit. I know that for example in the radeon 4870 card 1 openCl compute unit does not correspond to one thread/stream processor in the 4870. I.e. 1 compute unit can kick off more then 1 thread at a time.
It would be useful to have some sort of mechanism where you could see how many sub-processors are there in compute unit. For example if I have an array of 100 length and I have 10 compute units, how do i know to distribute the array across the 10 compute units. If for example there was 100 threads in 1 compute unit, it would be best to put the whole array into the first compute unit (local work size = 100 & global work size = 100) and use the last 9 for something else.
If I was to divide 100 equally across 10 compute units, each compute units would waste the rest 90 threads for that execution right?
I hope you can see my problem any suggestions to overcome this? Are there any plans to have a device query for max sub-compute units?
I know that some nvidia cards seem to have 1 stream processor corresponding to one openCl compute unit.