I wrote code like this:
the type of uc4Data is uchar4Code :
float4 f4Sum; for (int i=0; i<length; ++i) f4Sum += convert_float4(uc4Data[i])*pCo[i];
the type of pCo is float*
I use compute visual profile to check the performance and found that
f4Sum += convert_float4(uc4Data[i])*pCo[i]; used 8 registers!!
its tooooo much. How could I reduce the number of registers it used?
How dose the compiler settle the registers?