I've noticed that while CL_DEVICE_PREFERRED_VECTOR_WIDTH_* can be queried to determine the optimum vector width for an architecture, actually changing the vector width of your program in an automated way has some unexpected pitfalls.
It makes sense to add a "-DVECTYPE=" switch to the compilation to match the returned preferred vector size obtained from the device, so that either for instance 'float16', 'float8', 'float4', 'float2' or 'float' can be selected.
However, I've have trouble automating the type selection when the preferred vector size is one.
Essentially it seems that (and this is as far as I know the case for all scalar vs vector types):
convert_int4((int4)1 > (int4)0) == int4(-1, -1, -1, -1)
convert_int2((int2)1 > (int2)0) == int2(-1, -1)
convert_int((int)1 > (int)0) == int(1)
While it makes sense for the convert to behave as a mask for the vector types, and a simple truth value for the scalar, this is a little misleading. It would be possibly better if there were an 'int1' type that agrees with the previous mask-based approach so that automated type-based selection works more effectively. Does this already exist somewhere, or is it something that might be added to the specification? Otherwise, is there an easy solution to make a truth value the same across all types of a particular element width?
My solution so far is to add a special CONVERT_BOOLEXPR_FVECTYPE macro that added a minus sign in the single element case, for people finding this later having had the same trouble.