I have a simple kernel [ http://goo.gl/PBTsg ] that has been working OK on MacOSX (ATI HD4850 / nVidia GeForce whatever) for a while. I am using JOCL as a wrapper, and here is the code that invokes the kernel if anyone is curious [ http://goo.gl/qFhel ].
I decided to give it a go on Linux SUSE Enterprise 11 x64 (nVidia Tesla “Fermi” M2050) and other than a few tweaks to make the compiler happy (same tweaks applied to the version on mac and tested over there too) everything seems fine ... except it's not!
I am running the same computation (defined in the kernel) over around 300 items in parallel with the exact same initial conditions (al the input buffers are populated with the exact same values) and plotting results so that all the plots should look the exact same.
when I run on MacOSX all the plots show the wave form I expect and look the same -- on Linux some of them are completely empty. The ones that are not empty are fine indeed (so it confirm the kernel is doing its job just fine), but most of them seem to be missing data altogether.
I recall something similar from when I was trying to use double precision on machines that didn't support it, in that case only the first half of the elements were being processed (so if I had 300 only the 1st 150 were being computed) for some reason, but in this case I cannot find any obvious patterns.
I understand I cannot ask people to solve this for me with so little info, so all I am asking is for someone with a bit of experience in typical OpenCL cross-platform issues to have a look at my kernel to see if they can spot any of the *usual suspects*.
Any help/advice/suggestions in terms of troubleshooting appreciated!