Results 1 to 3 of 3

Thread: different local variables share memory?

  1. #1
    Junior Member
    Join Date
    Dec 2010
    Posts
    1

    different local variables share memory?

    Hello guys, i am starting in OpenCL and i have stumbled upon wierd behaviour. It is probably some basic thing but i can't figure it out. Any help appreciated.
    I have following kernel function:

    Code :
    __kernel void test()
    {
    * * __local float var1;
    * * __local float var2;
    * * 
    * * int x = get_local_id(0);
    * * var2 = 0;
    * * var1 += x;
    * * 
    * * barrier(CLK_LOCAL_MEM_FENCE);
    * * printf("x: %d var1: %f var2: %f \n", x, var1, var2);
    }

    which i am invoking in following way:

    Code :
    *
    * * * * cl::Kernel kernel_test(program, "test");
    * * * * queue.enqueueNDRangeKernel(
    * * * * * * kernel_test, 
    * * * * * * cl::NullRange,
    * * * * * * cl::NDRange(10),
    * * * * * * cl::NDRange(10),
    * * * * * * NULL,
    * * * * * * &event);
    *
    * * * * event.wait();

    and i get following result:
    Code :
    *
    x: 0 var1: 9.000000 var2: 9.000000
    x: 1 var1: 9.000000 var2: 9.000000
    x: 2 var1: 9.000000 var2: 9.000000
    x: 3 var1: 9.000000 var2: 9.000000
    x: 4 var1: 9.000000 var2: 9.000000
    x: 5 var1: 9.000000 var2: 9.000000
    x: 6 var1: 9.000000 var2: 9.000000
    x: 7 var1: 9.000000 var2: 9.000000
    x: 8 var1: 9.000000 var2: 9.000000
    x: 9 var1: 9.000000 var2: 9.000000
    *

    which suprised me because i was under impression that local variables should be shared between work-items in work-group and i am invoking one workgroup that contains 10 items. Threfore i should have a total sum<1-10> in var1. What suprised me even more is the fact, that content of var2 is identical with var1, which i don't see any reason for.

    I have ATI stream 2.2 installed, running this code on CPU (obviously) which is Phenom II X4 925

    Thx for help!

  2. #2
    Member
    Join Date
    Mar 2010
    Location
    Raleigh, NC
    Posts
    55

    Re: different local variables share memory?

    You are correct that __local describes variables that are allocated in local memory and shared by all work-items of a work-group.

    I think your first dilema comes from the CLK_LOCAL_MEM_FENCE barrier you are seeing. From the Spec, a barrier essentially means that:

    "All work-items in a work-group executing the kernel
    on a processor must execute this function before any
    are allowed to continue execution beyond the barrier.
    This function must be encountered by all work-items in
    a work-group executing the kernel."
    The more I look at this, though, the more confused I personally become. What I would do is try removing that barrier and seeing what happens. The next step would be reduce the local work-group size to 1, and switch the

    Code :
    int x = get_local_id(0);
    to
    Code :
    int x = get_global_id(0);

    I have a small theory as to why it is operating strangely, but the theory breaks down when looking at the x value in the output.

    I'm not sure why var2 is not 0 though.

  3. #3
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: different local variables share memory?

    As HolyGeneralK said, __local variables are shared across all work-items in a work-group. That means that you have to be careful when you update their values, or otherwise you will see the kind of problem you have today.

    The first problem is that var1 was never initialized. You increment its value, but the starting value is undefined, so the final result will also be undefined.

    In addition to that, think about this line of code from your kernel:

    Code :
        var1 += x;

    What is happening is that multiple work-items are trying to read and write into 'var1' simultaneously, stomping on each other's feet and producing incorrect results.

    One way to avoid this problem is using atomic operations (section 6.11.11 of the OpenCL 1.1. spec). Atomics guarantee that the output will be correct even if multiple work-items attempt to update the same variable at the same time. Unfortunately they are rather slow.

    The local barrier you place at the end would only be useful if different work-items were writing into different locations in __local memory and after the barrier you want them to read the most up-to-date values stored in it.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Replies: 6
    Last Post: 02-28-2013, 04:59 PM
  2. atomics for shared memory variables?
    By naroqueen in forum OpenCL
    Replies: 1
    Last Post: 09-12-2011, 01:57 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •