The out-of-order execution is one of the key features of OpenCL. However, I can only get it work using Intel's implementation. And there is no difference from in-order execution using Nvidia's implementation. See my post on Nvidia's OpenCL forum.
Can anyone state the support status of out-of-order execution on popular platforms?