PDA

View Full Version : Is float point operation in OpenCL stochastic?



linyufly
11-06-2012, 02:01 PM
Hi guys,

I have a float point calculation code and sometimes it gives different results than what it gets in most runs.

Sometimes I use "cl-opt-disable" and sometimes I just use "" for compiling the kernel.

As far as I know, there should be no racing conditions.

My GPU card is NVIDIA GTX 9800/GTX 9800+. I think the compute capability is 1.1 or 1.0. The device version is 1.0.

Thanks!

notzed
11-07-2012, 01:27 AM
No of course they aren't stochastic - floating point operations obviously work in a deterministic manner otherwise they wouldn't be much use would they? If they weren't deterministic in a GPU, GPU's wouldn't even be useful for graphics let alone more.

That doesn't mean compilation options might not alter the results if different instructions or instruction order is created as the associativity of floating point operations is data dependent. e.g. http://en.wikipedia.org/wiki/Floating_p ... y_problems (http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems)

But the only time i've seen 'random' results is with broken code. Possibly because of one or more of: incorrect or missing initialisation, boundary over-runs, race conditions, or some other bug.

linyufly
11-07-2012, 06:27 AM
Thanks Notzed. What is boundary over-runs?

Best regards,
Mingcheng


No of course they aren't stochastic - floating point operations obviously work in a deterministic manner otherwise they wouldn't be much use would they? If they weren't deterministic in a GPU, GPU's wouldn't even be useful for graphics let alone more.

That doesn't mean compilation options might not alter the results if different instructions or instruction order is created as the associativity of floating point operations is data dependent. e.g. http://en.wikipedia.org/wiki/Floating_p ... y_problems (http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems)

But the only time i've seen 'random' results is with broken code. Possibly because of one or more of: incorrect or missing initialisation, boundary over-runs, race conditions, or some other bug.

linyufly
11-07-2012, 07:42 AM
Hi notzed,

Is it possible that one processor in my card malfunctions?

Thanks!


No of course they aren't stochastic - floating point operations obviously work in a deterministic manner otherwise they wouldn't be much use would they? If they weren't deterministic in a GPU, GPU's wouldn't even be useful for graphics let alone more.

That doesn't mean compilation options might not alter the results if different instructions or instruction order is created as the associativity of floating point operations is data dependent. e.g. http://en.wikipedia.org/wiki/Floating_p ... y_problems (http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems)

But the only time i've seen 'random' results is with broken code. Possibly because of one or more of: incorrect or missing initialisation, boundary over-runs, race conditions, or some other bug.

LeeHowes
11-08-2012, 03:39 PM
Remember also that the order of floating point operations matters both within a work item (notzed's compiler point) and between work items - so any code where the order of operations is dependent on hardware scheduling may be nondeterministic even if locks etc are used correctly. Without hardware error control there is also the possibility that the occasional error creeps in, but this is unlikely to show up in anything but very long runs.

linyufly
11-08-2012, 05:22 PM
Thanks LeeHowes!

However, why does the order between work items influence the operation result?

Thanks!


Remember also that the order of floating point operations matters both within a work item (notzed's compiler point) and between work items - so any code where the order of operations is dependent on hardware scheduling may be nondeterministic even if locks etc are used correctly. Without hardware error control there is also the possibility that the occasional error creeps in, but this is unlikely to show up in anything but very long runs.

LeeHowes
11-08-2012, 05:33 PM
As notzed said:

if different instructions or instruction order is created as the associativity of floating point operations is data dependent. e.g. http://en.wikipedia.org/wiki/Floating_p ... y_problems

For example, if you add a random sequence:
2^-30, 2^30, 2^-30, 2^30 ...
you'll end up getting the same as:
0, 2^30, 0, 2^30 ...

because the mantissa of a float isn't big enough to hold the extra bit of precision.
However, if you sorted the list:
2^-30............... 2^30....

You would be able to add pairs of 2^-20 and if you add enough the sum may be enough to affect the addition with 2^30, and the result could be significantly different given the right combination of values and length of the list. That's an extreme case, in the general case any operation of floats incurs a possible loss of precision, and the order in which you do the operations changes the particular information that is lost.

linyufly
11-08-2012, 05:42 PM
I see. Thanks!


As notzed said:

if different instructions or instruction order is created as the associativity of floating point operations is data dependent. e.g. http://en.wikipedia.org/wiki/Floating_p ... y_problems

For example, if you add a random sequence:
2^-30, 2^30, 2^-30, 2^30 ...
you'll end up getting the same as:
0, 2^30, 0, 2^30 ...

because the mantissa of a float isn't big enough to hold the extra bit of precision.
However, if you sorted the list:
2^-30............... 2^30....

You would be able to add pairs of 2^-20 and if you add enough the sum may be enough to affect the addition with 2^30, and the result could be significantly different given the right combination of values and length of the list. That's an extreme case, in the general case any operation of floats incurs a possible loss of precision, and the order in which you do the operations changes the particular information that is lost.