So i've been searching a for a profiling tool that will allow me to profile/optimize my OpenCL kernels.
I'm using Ubuntu 12.04 64b, with Intel i7 3930K.
I have access to both an AMD GPU (HD6870) and NVidia GPU (GTX 580).

NVidia:
I've tried using NVidia's Visual Profiler (nvvp), but when trying to debug my OpenCL application
i just get "Warning: No CUDA application was profiled, exiting".
Altough i can't find any mention of this in NVidia's documentation, it seems as though
nvvp does not support profiling of OpenCL kernels, only CUDA - is this correct?

AMD:
AMD's APP profiler is limited to Visual Studio.
While i could use windows (at least for profiling), i don't have access to a full version of Visual Studio, and express versions don't seem to be supported, so this is not an option.

Intel:
Only supposedly supports opencl profiling on linux by using their "VTune™ Amplifier", which costs 899$ (not an option).
Still, I've tried out the trial version, but was unsuccessful as I only got profiling information on the host code. I don't see why i should pay 900 bucks for that if i can do it for free with gprof or oprofile, so again: not really an option.

So, what other tools are available which i could use for profiling my OpenCL device code?