clEnqueueReadBuffer( myQueue, myMem, CL_FALSE, 0, size, hostBuffer, 0, NULL, NULL );

I don't success on this call to be non-blocking...
In theory as blocking is CL_FALSE it must return immediatly, but I can see as if I increase the value of size, the computed time of clEnqueueReadBuffer is increased too, it's depending on the size.

I have done a lot of tests to check this, with timers, by varying the size, using the last parameter cl_event not null, by printing the result of hostBuffer is always correct even if I start immediatly to copy the hostBuffer to another buffer, in theory in a asynchronous way, I get the result correct because it seems clEnqueueReadBuffer do not return till execution is completelly finished, behaviour expected if I'd use CL_TRUE on blocking, but not the case.

I even reinstalled a couple of different nvidia drivers with same luck...
There is any initialization or configuration of opencl I must set correctly to allow working in a non-blocking way?
what i'm missing?