hi all:
any one know how to accelerate vecter normalization??
the precision may be sacrifice a little..
p.s now i use a inverse sqrt function and multiplication..
Printable View
hi all:
any one know how to accelerate vecter normalization??
the precision may be sacrifice a little..
p.s now i use a inverse sqrt function and multiplication..
what's your target cpu? do you need vector normalize in fixed or floating point?
Also, could you please post your existing normalization code, so we have a reference to start with. If possible include the assembly compiler output.
Nils
my target CPU is ARM1156T2-S
i need vector normalization in fixed points 15.16
and i re-computed the inverse sqrt table and use one iteration Newton-raphson.
#define DG_F int
#define DGint32 int
#define DGfixed32 int
#define DGfixed64 long long
#define DG_ONE 0x00010000
#define DG_ZERO 0x0
#define fMul32x32(a,b) ( (DGfixed64)a * b )
#define sar_64_32(a) (DGfixed32)( (a) >> 16 )
#define xMul(a,b) ((DGfixed32)((((DGfixed64)(a))*(b))>>DGX_FRAC_BITS ))
DGvoid DG_MATH_Normalize2B(DG_F *vec)
{
DG_F length;
DG_F a, b, c, d;
length = sar_64_32( fMul32x32(vec[0], vec[0]) +
fMul32x32(vec[1], vec[1]) +
fMul32x32(vec[2], vec[2]) );
//length = EGL_InvSqrt( 0xfffe );
length = DG_MATH_InvSqrt( length );
vec[0] = fMul(vec[0],length);
vec[1] = fMul(vec[1],length);
vec[2] = fMul(vec[2],length);
}
DG_F
DG_MATH_InvSqrt(DG_F a)
{
DG_F x;
DGint32 i, exp;
if ( a == DGX_ZERO ) return 0x7fffffff;
if ( a == DGX_ONE ) return a;
__asm
{
CLZ exp, a;
}
if ((exp&1)==0)
x = DG_context->g_pTInvSqrt_EVEN[(a>>(24-exp))&0x7f]; //28:8, 27:16, 26:32, 25:64, 24:128
else // &7 &f &1f &3f &7f
x=DG_context->g_pTInvSqrt_ODD[(a>>(24-exp))&0x7f];
exp -= 16;
if (exp <= 0)
x >>= -exp>>1;
else
x <<= (exp>>1)+(exp&1);
x = fMul((x>>1),(DGX_ONE*3 - fMul(fMul(a,x),x)));
return x;
}
Hi wycwang,
could you please post the assembly output of the DG_MATH_Normalize2B function? If you're using GCC (most likely) you can get it if you compile with the options -O3 -S somefile.c
That generates a somefile.s which should contain the assembly code.
The code looks fine. I'm almost sure the compiler just needs to be hinted into generating good code for it.
Nils
hi Nils:
of course this code is fine.
and i have use compile option -O3.
and i have an assembly version(hand make, an optimized code)
but i think i need a "MORE FAST" normalization method.
a different algorithm.. not only rely on code optimzation .
There aren't any more ways to get the performance up without loosing even more precision.
You could use this method of distance approximation:
http://www.oroboro.com/rafael/docserv.p ... e/distance
This could be just as slow as you need a divide afterwards. The distance approximation itself should compile to nice and fast code on any ARM.
hi Nils
this methos seems need a division operation(or a reciprocal).
because my reciprocal operation was implement using Newton raphson too
i think this method wouldn't faster.
hi Nils
this methos seems need a division operation(or a reciprocal).
because my reciprocal operation was implement using Newton raphson too
i think this method wouldn't faster.
i think the cube map normalization is a nice idea, but how to compute the access index is another issue....
What access index?Quote:
Originally Posted by wycwang