Is there anything similar to GPUPerfAPI on Nvidia OpenCL implementation? What libraries and tools do you use on nvidia hardware to understand performance of OpenCL kernels?