hi all

i run my code on cpu device and everything is all right, but when i run it ongpu device i get
error 36 that according to cl.h it corresponding to CL_INVALID_COMMAND_QUEUE
this is a piece of code that have problem:


while (round <= rounds) {

printf("Round %u...\n", (unsigned) round);
error |= clSetKernelArg(kernel, 5, sizeof(cl_uint), &round);
if (error != CL_SUCCESS) {
fprintf(stderr, "ERROR: clSetKernelArg, error code %d\n", error);
ok = 0;
goto cleanup;
}


error = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, &global_size, &local_size, 0, NULL, &event_execute);
if (error != CL_SUCCESS) {
fprintf(stderr, "ERROR: clEnqueueNDRangeKernel, error code %d\n", error);
ok = 0;
goto cleanup;
}
clFinish(command_queue);

execution_time += execution_time_msecs(event_execute);
++round;
}

when i run it (with global_size=64,local_size=1) on cpu it works(it goes every 10 rounds) but on gpu i get :
Round 0...
Round 1...
ERROR: clEnqueueNDRangeKernel, error code -36

i suspect that somehow synchronization has problem, then i add clFinish(command_queue) but still not works

any idea???