Results 1 to 6 of 6

Thread: problem understanding the behaviour of my kernel!

  1. #1
    Junior Member
    Join Date
    Feb 2011
    Posts
    4

    problem understanding the behaviour of my kernel!

    Hello,

    I am having problems understanding what a simple 1D kernel
    is doing with local memory.
    The kernel does nothing useful at the moment, I simplified it for
    this post, but nevertheless exhibits what seems to me strange
    behaviour.

    Kernel code:
    Code :
    #define STENCILSIZE(2)
    #define x11_size_x (20)
     
    __kernel void fluxes_x11_fast3(
    			       __global double* rhoe_in,
    			       __global double* rhoe_out  ) 
    {   
      int ind, num_point;
     
      // 1D version
      int main_ref = get_global_id(0); 
     
      //Identification of workgroup
      int i = get_group_id(0); 
     
      //Identification of work item inside workgroup
      int idX = get_local_id(0); 
      int sizeX = get_local_size(0); 
     
      __local double lrhoe[x11_size_x+2*STENCILSIZE];
     
      //Copy submatrixes to local memory. One element copied per work item
     
        lrhoe[idX+STENCILSIZE]   = rhoe_in[main_ref];  
     
      // "Edge" effects on the left 
      if(idX == 0 && i >= 1) {
     
        for (ind = 0; ind <= STENCILSIZE-1; ind++ ) { 
          lrhoe[idX+ind]   = rhoe_in[main_ref-STENCILSIZE+ind];
        }
      }
     
     
      // "Edge" effects on the right 
      if(idX == x11_size_x-1 && i <= (int)get_num_groups(0)-2 ) {
     
        for (ind=1; ind<=STENCILSIZE; ind++){
          lrhoe[idX+STENCILSIZE+ind]   = rhoe_in[main_ref+ind];
        }
      }
     
      // synchronise all the local memory
      barrier(CLK_LOCAL_MEM_FENCE);
     
      // main computation bit
     
      if ( main_ref >= STENCILSIZE && main_ref <=DATA_SIZE_X-1-STENCILSIZE ) {
     
     rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE];
     
      }       
     
    }

    The kernel has one input array, and one output array.
    All I'm trying to do for the moment is pull the input array into local
    memory, while taking "edge" effects into account (i.e. for each point
    in the input array, the local array would have neighbours of this point on
    each side).
    In the output array, I insert a value from the local array.

    What I don't understand is the values I get from the local array, see last line of
    actual code:
    - if I take lrhoe[idX+STENCILSIZE], I get the desired result, i.e. the input value.
    - if I take lrhoe[idX+STENCILSIZE-1], I expect the input array shifted by one to the
    right, but instead I get the input array shifted by 6.

    Can anyone explain this behaviour, or suggest what I might be doing wrong?
    I can also post host code if this is deemed useful.

    Many thanks!

    Olivier

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: problem understanding the behaviour of my kernel!

    That sounds odd. Also, shouldn't the last line read like this?
    Code :
    rhoe_out[main_ref-STENCILSIZE] = lrhoe[idX+STENCILSIZE];
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Feb 2011
    Posts
    4

    Re: problem understanding the behaviour of my kernel!

    Hi David, and thanks alot for answering.
    The kernel here is a much simplified finite-difference kernel,
    which I have got working properly in 1 and 2D using only global memory.
    The input and output arrays represent my entire domain, and thus "main_ref"
    is the index into the entire domain. The line you asked about,
    rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE];
    represents the computation of a new value rhoe from its neighbouring
    values; at the moment it is simply a duplication (i.e. lrhoe[idX+STENCILSIZE] should equal
    rhoe_in[main_ref]) but the idea is be to be able to do something like
    Code :
    double rhoe_out_private = lrhoe[idX+STENCILSIZE];
    for (index=1;index<=STENCILSIZE;index++)  {
         rhoe_out_private += 
                   coef[index] * (lrhoe[idX+STENCILSIZE+index] - lrhoe[+STENCILSIZE-index]);
    }
    rhoe_out[main_ref] = rhoe_out_private;

    So as I see it, the index I want into the out array is main_ref.
    I only update if stencilsize<main_ref<global_size-stencilsize due to a different treatment applied at boundaries.
    Does that make sense?
    Thanks again, I am at a loss!
    Olivier

  4. #4
    Junior Member
    Join Date
    Feb 2011
    Posts
    4

    Re: problem understanding the behaviour of my kernel!

    To add to my confusion, the kernel does exactly what I expect it to do, when run
    on the cpu with the amd sdk, that's to say it duplicates the input array
    with the following line
    Code :
    rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE];
    and shifts it by one to the right with the following line
    Code :
    rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE-1];

    Does this point to a bug in the nvidia implementation, or am I perhaps misusing
    some aspect of local memory to which the amd cpu implementation is less
    sensitive?
    Many thanks,
    Olivier

  5. #5
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: problem understanding the behaviour of my kernel!

    It looks like a possible bug in NVidia's compiler. I suggest sending it to their customer support.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  6. #6
    Junior Member
    Join Date
    Feb 2011
    Posts
    4

    Re: problem understanding the behaviour of my kernel!

    OK, I'll do that and update this thread when I get a response from NVIDIA.
    Thanks alot for the help,
    Olivier

Similar Threads

  1. COMMAND QUEUE weird behaviour
    By a.mirzaean in forum OpenCL
    Replies: 2
    Last Post: 07-24-2012, 05:17 AM
  2. kernel understanding
    By Asengott in forum OpenCL
    Replies: 0
    Last Post: 01-19-2011, 02:58 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •