For the past 3 months I've been trying to parallelize a code using the graphics card but so far I've being unsuccessful. Unfortunately my code is too large, so I cannot display it here.
So this is the problem: when I considered all the code, although "clFinish" did not return any error, when I read the variables back from the GPU to the CPU, I obtained zeros eveywhere (as if nothing had happned).
I then went to invetigate why that happened and so I removed portions of the code. I removed pretty much everything, leaving only initialization and smaller algebraic operations. In this case, everyhting worked as planned: clFinish did not return any error and the variables that I read back from the GPU were what they are expected to be.
Since everything seemed to be working fine, I then started to include again portions of the code that was removed, starting by restoring a loop with an inner loop. Only simple algebraic operations are done inside the loops (there is no memory access violations: I've tested the code on CPU). Now, after adding these loops, clFinish returns CL_INVALID_COMMAND_QUEUE, and that is what I cannot understand!
Does anyone have an idea of what may be causing clFinish to return that error? I am confident that the're no mistake on the loop that I've restored, since I tested it on CPU.
I can copy the code to whoever may be interested.