Hi folks; it's been a while but I thought I'd post my final problem here and see if it resonates with anyone.
I have a kernel that runs beautifully on nVidia, which I am vectorizing for an AMD 5870.
I'm doing some trig, so to eliminate branching I have turned things like
if ( T < 0.f ) T += 360.f;
T += ( T < 0.f ) * 360.f;
( This is of course necessary to as to process all 4 elements of the vector in one go; if the branch were used then it would have to be performed individually per each element of the vector.)
... all cool, and the logic is good; it works for float1s. (And, doesn't hurt performance!!!??, even if it's doing 8 complex calculations for 8 different conditions; I am happily surprised....)
-> However, when you're using float4s, the value of a comparison is different. Instead of getting +1 back from a logical comparison, you get -1. SO, in order to use it in a calculation, as above, it's necessary to somehow change that -1 to a +1 for the equation to yield what it needs to.
This hangs the 5870.
I have #defined FLOGIC(x) to take i.e. ( T < 0.f ) and change the sign of the result, in a number of ways:
#define FLOGIC(x) (float4) -(x);
#define FLOGIC(x) (float4) (x) * -1.f;
#define FLOGIC(x) (float4) ( x * -1 );
#define FLOGIC(x) (float4) ( abs( x ) );
#define FLOGIC(x) fabs( (float4) x );
. . . if I don't do this, if I use the result of the logical comparison as originally depicted way up above, then the kernel compiles and runs beautifully except for the fact that the -1 ruins all the calculations it touches and the results are useless.
. . . if I *DO* do this, if I attempt any of the above-described methods to reverse the sign of the float4 logical comparison, the kernel never comes back. (Same deal if I use a function instead of a #define. I can do anything in that function or #define +except+ change the sign without terminally messing things up.)
It sails through clBuildProgram and clCreateKernel, enqueues fine, and then clFinish hangs the whole machine. [ Mac Pro ]
The cursor still follows the mouse around but the clock is frozen and so is everything else, requiring a hard boot.
Does anybody have any ideas?
p.s. what fun to be here in opencl's early days, huh?