That error usually means that the code has crashed on the gpu - which means you've got a bug in your kernel code (or the arguments you're giving it).
Having it work on a cpu is a good test but it has a totally different execution engine and memory map so bugs manifest themselves differently.
what are different between cpu and gpu in execution kernel,
i mean, when i run kernel on gpu with global_size=64 and local_size=1 and then run it with same
parameter global_size=64 and local_size=1 on cpu what is deferent except "command_queue".
i was thinking that when i don't group data (local_size=1) , then there is no deferent between running on cpu and gpu then i have to get same result from both( both cpu and gnu run same kernel).
did i misunderstand something?? :?:
i entirely commented kernel content but nothing changed i still take that error?
Originally Posted by notzed
when i comment "clFinish(command_queue)" the while loop finish correctly( i can see Round 0 ...
till Round10 ... in output) but after while loop i get same error "error code -36" but this time
it is relative to "clEnqueueReadBuffer"?
i'am pretty sure that "command_queue" has problem because:
1. i comment content of kernel entirely but i get same error then the problem can't be of
2. i NULL local_size( workgroup size) till opencl assign it itself and error remain
3. i NULL event_execute and noting change
4. the only option that change between running on cpu( that works correctly) and gpu (that
has error) is command_queue and other options are same for cpu and gnu
then the only error prone option is "command_queue"
but i have no idea what else can i do , because i don't have any access to command queue and i
don't know how to debug it?
please help :(
i found something:
Originally Posted by a.mirzaean
actually when we run kernel with global_size=m and local_size=1 on gpu opencl spread kernel
between m "compute unit" that each one has only a work item but we have different scenario on
cpu i think the only option for running kernel on cpu is local_size=1 (the only number that can assign to local_size is 1) maybe with this constraint
we force cpu to run serially ???
am i correct ??