In Programming Guide - ATI Stream Computing OpenCL™ in a example of ATI Radeon HD 5870 have 20 Compute Units, each with 16 stream cores and each of that, having 5 Processing Elements, yeld a 1600 Processing Elements.

But, in other parts say that have the notion of WaveFronts, indicating that have to put more than workload than stream cores, at first because he run a VLIW instruction.

How much work-items i have to put to execute?

Considering only in number of the stream cores (320) or considering all (1600)?