I'd like to see a more accurate ULP accuracy profile and someway to detect or request it.

If you compare the CUDA minimum ulp accuracy information with the opencl spec you will find that the CUDA specification requires a higher minimum accuracy.
i.e. If you write to the CUDA api you are guaranteed a higher accuracy.

Also there are no equivalents of the CUDA -prec-sqrt=true and -prec-div=true