Results 1 to 6 of 6

Thread: Data Feedback

  1. #1
    Junior Member
    Join Date
    Feb 2010
    Posts
    3

    Data Feedback

    Hi

    I'm new to OpenCL and am trying to port the source code below to Nvidia GPU Quadro FX1700 using OpenCL. There are data feedback (i.e. alpha_t[s]=new_alpha_t[s]) in the nested loops so that the intermediate results are used in the subsequent computations. How do I achieve data feedback in the kernel? I used a global work size of 752 and local work size of 8 for my kernel. In addition, I perform a loop unrolling in the innermost loop (i.e. z) to achieve sum[0], sum[1], sum[2] and sum[3].

    Code :
    sm_lut[4][8] = {{1,5,5,7,6,3,0,5},
                    {2,6,4,6,5,2,2,7},
                    {7,7,3,1,3,5,2,2},
                    {0,1,2,0,4,4,6,1}
                   };
     
    int s,m,z;
    int alpha_t[8]={0};
    int new_alpha_t[8];
     
    for (m=0; m<752; m++) 
    {
        for (s=0; s<8; s++) 
        {
            int sum[4];
     
            for (z=0; z<4; z++)
            {
                int sm1;
                sm1 = sm_lut[z][s];
                sum[z] = alpha_t[sm1];
            }
            new_alpha_t[s]=max4(sum[0],sum[1],sum[2],sum[3]);
        }
     
        for (s=0; s<8; s++)
            alpha_t[s]=new_alpha_t[s];
    }

    Thanks in advance for your help.

  2. #2
    Junior Member
    Join Date
    Dec 2009
    Posts
    22

    Re: Data Feedback

    copy it back to host

  3. #3
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Data Feedback

    If I understand you correctly, you want to feed data back between separate work-groups within one kernel execution. You can do this, but OpenCL does *not* allow synchronization between work-groups. This means you need to ensure at the algorithm level that you do not have any data races, and then you can do it through global memory. If you need synchronization (e.g., all work-groups are done before the next iteration) you have to do it through multiple kernel executions with the results stored to global memory.

  4. #4
    Junior Member
    Join Date
    Feb 2010
    Posts
    3

    Re: Data Feedback

    I need synchronization (e.g., all work-groups are done before the next iteration) due to the innermost loop (ie. z) using alpha_t from the previous loop to compute the sum values in subsequent loops.

    I used a global worksize of 6016 (= 752 * 8) and local worksize of 8 for my kernel. The data size of _beta0 - _beta3 is 6016 and they were computed from another kernel. However I am unable to get the correct results using the kernel below:
    Code :
    __kernel void forward(__global int *_dResult,
                          __global int *_beta0,
                          __global int *_beta1,
                          __global int *_beta2,
                          __global int *_beta3)
    {
        const int sm_lut[32] = {0, 1, 6, 7, 2, 3, 4, 5,
                                5, 4, 3, 2, 7, 6, 1, 0,
                                1, 0, 7, 6, 3, 2, 5, 4,
                                4, 5, 2, 3, 6, 7, 0, 1
                               };
     
        volatile int alpha_t[8];
        volatile int new_alpha_t[8];
        int sum[4] = {0};
     
        int gid = get_global_id(0);
        int lid = get_local_id(0);
        int idx = lid << 2;
        int size = get_global_size(0) / get_local_size(0);
     
        for (int i = 0; i < 8; i++)
            new_alpha_t[i] = alpha_t[i] = 0;
     
        for (int t = 0; t < size; t++)
        {
            sum[0] = alpha_t[s_to_sm1[idx]]     + _beta0[gid];
            sum[1] = alpha_t[s_to_sm1[idx + 1]] + _beta1[gid];
            sum[2] = alpha_t[s_to_sm1[idx + 2]] + _beta2[gid];
            sum[3] = alpha_t[s_to_sm1[idx + 3]] + _beta3[gid];
     
            new_alpha_t[lid] = max4(sum[0], sum[1], sum[2], sum[3]);
            alpha_t[lid] = new_alpha_t[lid];
        }
        _dResult[gid] = alpha_t[lid];
    }

    Could u pls let me know how i could compute the (6016) results through multiple kernel executions?

    Thanks for your help.

  5. #5
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Data Feedback

    If you're trying to synchronize across work-groups (which is not allowed in OpenCL except via multiple kernel executions) you just write out your new alpha_t values to global memory when you're done with one execution, and then read them in with the next execution. That way you know that before the second kernel execution starts all the values from the first kernel execution have been written out. Ideally you would re-factor your algorithm to only need synchronization within a work-group and you wouldn't have to incur that overhead. (If your kernel isn't doing a lot of work on an iteration, the overhead of starting it can be a substantial amount of the total execution.)

  6. #6
    Junior Member
    Join Date
    Feb 2010
    Posts
    3

    Re: Data Feedback

    Got it. Thanks for your help.

Similar Threads

  1. Feedback and errate for the 1.0.33 spec
    By bknafla in forum OpenCL
    Replies: 0
    Last Post: 05-18-2009, 03:11 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •