I am trying to get the following kernel to properly add up the global ids. Of course this is pointless, but it illustrates something I am trying to make work in a larger kernel. Basically my kernels perform a fair amount of calculations, but the end result that I want to get back is a small array of various totals. Performing these totals in a parallel fashion is that does not seem to be working. If I execute the following kernel, with a fixed number of work units, I would like to always get the same result. Say for 100 work units, I would expect 0+1+2+3+ .... + 99. However, every time I run the kernel, I get a different number.

Is this what mem_fence is attempting to solve? Or is there some other technique I need to use. The number I total needs to be floating point. I also tried putting the mem_fence

kernel void AtomicSum(

global write_only float* c )

{

int index = get_global_id(0);

mem_fence(CLK_GLOBAL_MEM_FENCE);

c[0] += (float)index;

}