Hello,

We request some clarification regarding implementation of OpenCL
barriers, and in particular, barriers inside conditionals.

Some setup for our question:

The OpenCL 1.2 (revision 19) specification (section 6.12.8
Synchronization Functions) states:

"If barrier is inside a conditional statement, then all
work-items must enter the conditional if any work-item enters the
conditional statement and executes the barrier."

Let's call this "the OpenCL barrier rule".

This wording appears to state a program-counter level
restriction. Each barrier in the code is treated as a different
execution barrier.

Therefore, we believe the OpenCL code:

__kernel
void foo(...)
{
...
if (<condition>)
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE) // barrier A
else
barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE) // barrier B
...
}

would NOT execute in compliance with OpenCL if work-items from any
particular work group evaluate the condition differently, even though
all work-items from that group would be guaranteed to execute either
barrier A or barrier B.

For example, if there are 3 thread groups in one work group, one
thread group may execute barrier A, and two thread groups may execute
barrier B. We believe this is NOT compliant with OpenCL. Correct?

The results of incorrect execution of the barrier appear to be
implicitly undefined. We did not find any language in the
specification which specifically described how implementation
should handle incorrect barrier execution.

We know that failure to satisfy the OpenCL barrier rule could well
result in a race condition on hardware. Naturally, vendors may seek to
protect their devices from hangs due to incorrect programs.

Now our question:

Would it be a correct implementation of OpenCL if some
combination of software and hardware contrived to enforce the
OpenCL barrier rule.

That is, if the OpenCL code as programmed fails to execute in a
way to satisfy the OpenCL barrier rule, nevertheless, the
implementation enforces it: if one work-item in a work-group
reaches a barrier, all work-items in that workgroup reach that
same barrier.

Such a method, if it can be devised, appears to go beyond merely
protecting hardware from hangs, to enabling forward progress on
code that would otherwise halt (or cause an exception to be
thrown). This has the subtle consequence of perhaps changing the
semantics of the input program.

We ask for a ruling in this matter, and otherwise would
appreciate any further comments or corrections.

Thank you.