I am writing a convolution algorithm in c#, using the Cloo library. I have a convolution algorithm that I am sending into the GPU and I am testing it against a serial version of the same algorithm. Unfortunately, I am getting miniscule errors back in my data. I have 1243/64407 inconsistencies, where the output is off by 1, which leads me to think that there is something different going on dealing with truncation.
Is there an inconsistency in the way that OpenCL truncates or approximates vs C#?