How OpenCL could be faster than OpenMP on the same device?
I use a very simple perlin algorithm on my core i7, and I have these results :
33554432 elements, 8 work groups
simple CPU : 1.33 sec
OpenMP CPU : 0.27 sec (4.9 time faster than simple CPU)
OpenCL CPU : 0.15 sec (8.7 time faster than simple CPU)
Is it normal or even possible, or just a bug in my opencl code?
nVidia gtx 275, 195.62
intel core i7
amd ati stream sdk 2 beta 4
visual studio 2008