Hi
I wrote code like this:
Code :
float4 f4Sum;
for (int i=0; i<length; ++i) 
 f4Sum += convert_float4(uc4Data[i])*pCo[i];
the type of uc4Data is uchar4
the type of pCo is float*
I use compute visual profile to check the performance and found that
f4Sum += convert_float4(uc4Data[i])*pCo[i]; used 8 registers!!
its tooooo much. How could I reduce the number of registers it used?
How dose the compiler settle the registers?
thank you!