Hi, I am doing some simple OpenCL tests and i found that my kernel code compiles faster on Nvidia GPU (GeForce GTX 295) rather than AMD GPU (Cayman).

I am using a separate .cl file of 533 lines, containing only one kernel. This kernel uses 1000 iterations of an algorithm. My program works as expected on Nvida card (and takes 0.37 secs ), but on AMD card (it takes more than 25 mins and aborts by displaying UNREACHABLE executed! while building).

When i reduce the number of iteration to 10, kernel works as expected on AMD card but still it takes comparable more time to build. (on AMD card it takes 6min 29secs, and on Nvidia card it takes 20 secs)

What could be the reason ?

Thanks !