Hello, I'm beginning programming with openCL. I'm using it on my CPU because at this time I don't have a compatible GPU Card.

I think that my question should be easy to solve.

Suppose I have an array of 4 integers [0, 2, 0, 0] and a kernel function like this:

Code :
__kernel void 
tst(__global int *s,
	__global int *answer)
{
	int gid = get_global_id(0);
	if(s[gid] == 0 && s[gid - 1] > 0) {
		s[gid] = s[gid - 1];
	}
	answer[gid] = s[gid];
}

What I'm expecting is that the 4 elements will be evaluated in parallel and the result will be [0, 2, 2, 0]. Instead I get [0, 2, 2, 2], as if elements were processed sequentially.

What I'm doing wrong in your opinion?

Many thanks for your help