I would be interested in knowing more how opencl does the synchronization under the hood. I have understood that once the kernel is done, the data is visible to the host therefore the host can copy the data. In other word if I am using the flag CL_TRUE in clEnqueueNDRangeKernel then the host has to wait. I understood that there is a mechanism of events that are sent, but how does the GPU actually send that event so that the CPU can see it, or in general how does the device send the event?

Thank you very much.