Results 1 to 3 of 3

Thread: Global Barriers?

  1. #1
    Junior Member
    Join Date
    Dec 2008
    Location
    Toronto, Ontario, Canada
    Posts
    16

    Global Barriers?

    Currently I'm writing an algorithm where I need a single (very quick) global barrier, and then processing can resume in parallel as it was... so basically I have a large amount of parallel work, then all work_items should hit a barrier... one work item proceeds past and does some very quick work... then all work_items resume past the barrier.

    I don't see that this is possible with OpenCL. The barrier() instruction specifies that it only applies to work groups. This isn't good enough, because I want to work at the global_id level.

    The other thing to do is to break my kernels into three kernels... kernel_1 does everything in parallel up to the barrier... kernel_2 does a single_work item and very little work (a huge waste of time to spawn, but required for the algorithm), and finally kernel_3 again works in parallel. Obviously I want to avoid the CPU management where I can, because it will add a bit of overhead that isn't required.

    Normally I wouldn't care... but this is part of a very time-critical algorithm, and I want to ensure this part is as fast as possible.

    Thanks!

  2. #2
    Member
    Join Date
    Nov 2009
    Location
    Scotland
    Posts
    72

    Re: Global Barriers?

    OpenCL only supports synchronization within workgroups. The official way of a global synchronization is to have multiple kernels as you pointed out. But rather than having 3 kernels you would only need 2 I think: In the first kernel you do all the work up to the barrier and only one workitem (say the one with global_id 0) does the sequential work. Then in the second kernel you do the remaining parallel work.

    There's a paper a this year's CC conference called "Automatic C-to-CUDA Code Generation for Affine Programs". They say they use
    a "single-writer multiple-reader" technique to achieve synchronization across thread blocks using the global memory space
    They don't discuss the performance of this technique though...

  3. #3
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Global Barriers?

    The "single-writer multiple-reader" thing sounds a lot like one work-item writes and the others spin-lock on it. That may work, but without assurances as to how the hardware schedules work-groups it might also never complete. (I've heard that it tends to work on Nvidia hardware.)

Similar Threads

  1. Replies: 1
    Last Post: 09-30-2011, 02:14 PM
  2. Specifics of barriers
    By xgromd in forum OpenCL
    Replies: 1
    Last Post: 09-30-2011, 02:12 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •