Hi

I'm new to OpenCL and am trying to port the source code below to Nvidia GPU Quadro FX1700 using OpenCL. There are data feedback (i.e. alpha_t[s]=new_alpha_t[s]) in the nested loops so that the intermediate results are used in the subsequent computations. How do I achieve data feedback in the kernel? I used a global work size of 752 and local work size of 8 for my kernel. In addition, I perform a loop unrolling in the innermost loop (i.e. z) to achieve sum[0], sum[1], sum[2] and sum[3].

Code :
sm_lut[4][8] = {{1,5,5,7,6,3,0,5},
                {2,6,4,6,5,2,2,7},
                {7,7,3,1,3,5,2,2},
                {0,1,2,0,4,4,6,1}
               };
 
int s,m,z;
int alpha_t[8]={0};
int new_alpha_t[8];
 
for (m=0; m<752; m++) 
{
    for (s=0; s<8; s++) 
    {
        int sum[4];
 
        for (z=0; z<4; z++)
        {
            int sm1;
            sm1 = sm_lut[z][s];
            sum[z] = alpha_t[sm1];
        }
        new_alpha_t[s]=max4(sum[0],sum[1],sum[2],sum[3]);
    }
 
    for (s=0; s<8; s++)
        alpha_t[s]=new_alpha_t[s];
}

Thanks in advance for your help.