I'm new to OpenCL and am trying to port the source code below to Nvidia GPU Quadro FX1700 using OpenCL. There are data feedback (i.e. alpha_t[s]=new_alpha_t[s]) in the nested loops so that the intermediate results are used in the subsequent computations. How do I achieve data feedback in the kernel? I used a global work size of 752 and local work size of 8 for my kernel. In addition, I perform a loop unrolling in the innermost loop (i.e. z) to achieve sum[0], sum[1], sum[2] and sum[3].

Code :
sm_lut[4][8] = {{1,5,5,7,6,3,0,5},
int s,m,z;
int alpha_t[8]={0};
int new_alpha_t[8];
for (m=0; m<752; m++) 
    for (s=0; s<8; s++) 
        int sum[4];
        for (z=0; z<4; z++)
            int sm1;
            sm1 = sm_lut[z][s];
            sum[z] = alpha_t[sm1];
    for (s=0; s<8; s++)

Thanks in advance for your help.