Results 1 to 6 of 6

Thread: How to perform atomic sums on floats

  1. #1
    Junior Member
    Join Date
    May 2010
    Posts
    5

    How to perform atomic sums on floats

    I am trying to get the following kernel to properly add up the global ids. Of course this is pointless, but it illustrates something I am trying to make work in a larger kernel. Basically my kernels perform a fair amount of calculations, but the end result that I want to get back is a small array of various totals. Performing these totals in a parallel fashion is that does not seem to be working. If I execute the following kernel, with a fixed number of work units, I would like to always get the same result. Say for 100 work units, I would expect 0+1+2+3+ .... + 99. However, every time I run the kernel, I get a different number.

    Is this what mem_fence is attempting to solve? Or is there some other technique I need to use. The number I total needs to be floating point. I also tried putting the mem_fence

    kernel void AtomicSum(
    global write_only float* c )
    {
    int index = get_global_id(0);
    mem_fence(CLK_GLOBAL_MEM_FENCE);
    c[0] += (float)index;
    }

  2. #2
    Junior Member
    Join Date
    May 2010
    Posts
    2

    Re: How to perform atomic sums on floats

    You are right, i would expect as well a 1+2+3......+99 ?

    did you try barrier(CLK_GLOBAL_MEM_FENCE); instead of mem_fence() ?

  3. #3
    Junior Member
    Join Date
    Mar 2010
    Posts
    22

    Re: How to perform atomic sums on floats

    The code is not right... and does not perform summation that you want. The problem is in te way that you perform de operation, you are telling to OpenCL that every workItem update the same portion of memory, so when a workItem needs to do the operation, the copy of the portion of memory that reads it can be not the real one, in other words, the operation is not secuentially. Basically, there are two forms of doing this operation, one is assign the operation to ONE workItem, and the other is using some method of reduction, in the last you perform a division of the operation.

  4. #4
    Junior Member
    Join Date
    May 2010
    Posts
    5

    Re: How to perform atomic sums on floats

    No, it is doing what I want it to... I want it to update the same area of memory. Can't OpenCL share memory, I thought thats what mem_fence was for.

    Basically I am trying to write the equiv of:

    float sum = 0;
    for(int i=0;i<100;i++)
    sum+=(float)i;

    So thats why I am updating the same piece of memory. I perform a complex calculation thousands upon thousands of times. But I do not need to know the individual results. Just the sum. Are you saying I need to allocate a very large buffer and never touch the same piece of memory twice? I got it to work that way, but it takes way too much memory.

  5. #5
    Junior Member
    Join Date
    Mar 2010
    Posts
    22

    Re: How to perform atomic sums on floats

    No.... you don't understand me. let's put an example. You have one piece of memory for the result value, and a piece of memory with some values. When you have to sum all this values and return a result value, each workItems read the peace of memory of the result and adds the corresponding value, but............ the memory that is reading at that time, was reading by another workItems that are doing the same operation. So, everyone do the sum, and then each one stores his value with an incorrect result!

    The atomics operation in OpenCL Specification do this operations in the correct way, in other words, secuentially adds the same portion of memory and update it. They are slow.

    Another thing, the barriers are use for synchronize the work in workItems of the same workGroup. But the order that the workItems execute after the barrier it's not specified, so the barrier in your code is wrong, it's telling that only can go on with the execution, only if all workItem in the same workGroup perform ALL the operations behind de barrier.

  6. #6
    Junior Member
    Join Date
    Dec 2011
    Posts
    3

    Re: How to perform atomic sums on floats

    You can use atomic_cmpxchg function and C union to achieve it for floating point
    http://suhorukov.blogspot.com/2011/1...-floating.html

    you can implement several "reduce" steps in your program to aggregate large dataset in parallel manner to avoid concurrence and produce result on last "reduce" step http://en.wikipedia.org/wiki/Fold_%2...er_function%29

Similar Threads

  1. Rounding floats
    By Siassei in forum OpenCL
    Replies: 1
    Last Post: 05-13-2011, 03:33 PM
  2. How do opencl kernel perform?
    By phoebe0105 in forum OpenCL
    Replies: 7
    Last Post: 05-28-2010, 11:24 AM
  3. Doubles/Floats
    By toneburst in forum OpenCL
    Replies: 2
    Last Post: 01-11-2010, 06:26 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •