I am trying to get the following kernel to properly add up the global ids. Of course this is pointless, but it illustrates something I am trying to make work in a larger kernel. Basically my kernels perform a fair amount of calculations, but the end result that I want to get back is a small array of various totals. Performing these totals in a parallel fashion is that does not seem to be working. If I execute the following kernel, with a fixed number of work units, I would like to always get the same result. Say for 100 work units, I would expect 0+1+2+3+ .... + 99. However, every time I run the kernel, I get a different number.
Is this what mem_fence is attempting to solve? Or is there some other technique I need to use. The number I total needs to be floating point. I also tried putting the mem_fence
kernel void AtomicSum(
global write_only float* c )
int index = get_global_id(0);
c += (float)index;