Hello All,

I wrote a kernel, this kernel will execute an equation, and when I use the visual profiler I found that I cant run all available thread in parallel, for example the gpu can run 1563 thread in parallel but I can reach only 1024 due to register factor, and the profiler said that I use 26 register per thread which is high, any one can give me a hint how to reduce the number of register per thread?

The kernel:
Code :
__kernel void relCalculation(const __global int* a,
                             const __global double* fProb,
                             const __global float* T,                             
                             __global int* Output,
                             const int max,
                             const int idx,
                             const int col,                             
                             const double rel,
                             const double t)
{
 
    const int i = get_global_id(0);
 
    if (i <= max )
    {         
 
         double GP = 1;
        for (int j = 0; j < col; ++j)
        {
            GP *= pow( (1 - pow(fProb[j + idx], a[i*col + j]) ), t/T[j]);
        }
 
        if ( GP >= rel)
            Output[i] = 1; 
        else
            Output[i] = 0; 
 
    }
}