Results 1 to 3 of 3

Thread: clarification request: barrier()

  1. #1
    Newbie
    Join Date
    May 2013
    Posts
    2

    clarification request: barrier()

    Hello,

    We request some clarification regarding implementation of OpenCL
    barriers, and in particular, barriers inside conditionals.

    Some setup for our question:

    The OpenCL 1.2 (revision 19) specification (section 6.12.8
    Synchronization Functions) states:

    "If barrier is inside a conditional statement, then all
    work-items must enter the conditional if any work-item enters the
    conditional statement and executes the barrier."

    Let's call this "the OpenCL barrier rule".

    This wording appears to state a program-counter level
    restriction. Each barrier in the code is treated as a different
    execution barrier.

    Therefore, we believe the OpenCL code:

    __kernel
    void foo(...)
    {
    ...
    if (<condition>)
    barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE) // barrier A
    else
    barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE) // barrier B
    ...
    }

    would NOT execute in compliance with OpenCL if work-items from any
    particular work group evaluate the condition differently, even though
    all work-items from that group would be guaranteed to execute either
    barrier A or barrier B.

    For example, if there are 3 thread groups in one work group, one
    thread group may execute barrier A, and two thread groups may execute
    barrier B. We believe this is NOT compliant with OpenCL. Correct?

    The results of incorrect execution of the barrier appear to be
    implicitly undefined. We did not find any language in the
    specification which specifically described how implementation
    should handle incorrect barrier execution.

    We know that failure to satisfy the OpenCL barrier rule could well
    result in a race condition on hardware. Naturally, vendors may seek to
    protect their devices from hangs due to incorrect programs.

    Now our question:

    Would it be a correct implementation of OpenCL if some
    combination of software and hardware contrived to enforce the
    OpenCL barrier rule.

    That is, if the OpenCL code as programmed fails to execute in a
    way to satisfy the OpenCL barrier rule, nevertheless, the
    implementation enforces it: if one work-item in a work-group
    reaches a barrier, all work-items in that workgroup reach that
    same barrier.

    Such a method, if it can be devised, appears to go beyond merely
    protecting hardware from hangs, to enabling forward progress on
    code that would otherwise halt (or cause an exception to be
    thrown). This has the subtle consequence of perhaps changing the
    semantics of the input program.

    We ask for a ruling in this matter, and otherwise would
    appreciate any further comments or corrections.

    Thank you.

  2. #2
    Junior Member
    Join Date
    Mar 2013
    Posts
    8
    Hello,

    Quote Originally Posted by rickbagley View Post

    __kernel
    void foo(...)
    {
    ...
    if (<condition>)
    barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE) // barrier A
    else
    barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE) // barrier B
    ...
    }
    the compiler will prolly factorize it that way :

    barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE)

    if (<condition>) {
    ...
    }else{

    ...
    }

    Quote Originally Posted by rickbagley View Post

    That is, if the OpenCL code as programmed fails to execute in a
    way to satisfy the OpenCL barrier rule, nevertheless, the
    implementation enforces it: if one work-item in a work-group
    reaches a barrier, all work-items in that workgroup reach that
    same barrier.
    I don't think it is possible. AFAIK barrier are called to ensure write in local/global before the barrier are done by all work items. So it is a synchronization point in the kernel. I suppose it works like that,
    when one work items encounter a barrier, the scheduller swap to another work items until he finds the barrier the first work items encountered. If it doesn't encounter one, the work items won't be able to synchronize (error).
    I don't see how the program/hardware could guess a good reconvergence point for those two work items by itself.

  3. #3
    Newbie
    Join Date
    May 2013
    Posts
    2
    Thank you for taking the time to respond to our query. But your reply was not helpful to us.

    Quote Originally Posted by roger512 View Post
    the compiler will prolly factorize it that way :

    barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE)

    if (<condition>) {
    ...
    }else{

    ...
    }.
    We appreciate there is redundancy in this example. For the purposes of this discussion, let's assume that the code will be executed as shown.

    Quote Originally Posted by roger512 View Post
    I don't think it is possible. AFAIK barrier are called to ensure write in local/global before the barrier are done by all work items. So it is a synchronization point in the kernel. I suppose it works like that,
    when one work items encounter a barrier, the scheduller swap to another work items until he finds the barrier the first work items encountered. If it doesn't encounter one, the work items won't be able to synchronize (error).
    I don't see how the program/hardware could guess a good reconvergence point for those two work items by itself.
    We appreciate these comments, which question our scenario. But you do not address our question. If our scenario could be implemented, would the resulting execution of code compiled from an OpenCL program be correct?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •