When using an OpenCL data parallel kernel on an SSE enabled CPU, does OpenCL automatically create SSE code to map work items to the channels of the SSE compute units? Or do you have to code using the OpenCL vector data types to take advantage of the SSE?

The manual seems to suggest that when using the data parallel programming model, SSE code is generated automatically. While with the task parallel model, you have to use the vector datatypes. However, I've seen some comments around the web that would seem to suggest that you always have to use the vector data types to generate SSE code....