I'm currently working with opencl and i'm getting issues with a high amount of registers per thread in my main kernel.
The main kernel use a quite large amount of float4 but actually it could be float3 most of the time. I know cl_float3 is a typedef of cl_float4, i also know that float3 on device side is a 16 bytes struct.
Am i right, if i think that extra unused float is a waste of register ?
if yes I'm looking for a tip to bypass this problem ?
sorry for bad english.