Results 1 to 6 of 6

Thread: Great performance loss on conditionals

  1. #1
    Junior Member
    Join Date
    Apr 2010
    Posts
    3

    Great performance loss on conditionals

    Hello everybody!

    Though just starting to get into OpenCL realm, I have already noticed a strange effect.
    Namely, my OpenCL code works very efficiently, until I add a very simple "if" statement that chooses the largest of two floats.
    My estimation shows that it consumes about 2/3 of the algorithm's total time.
    I have tried various ways to avoid using "if":
    - calling max() and fmax(),
    - using a formula based on sign(x - y)

    but always it's too slow

    Due to this, the same algo is x3 times more efficient on AMD Athlon(tm) II X4 620 2.61 GHz than on NVIDIA GeForce 9600 GT I'm using as OpenCL hardware. So the idea of GPU-based computing seems quite "unripe"...

    Is there any general recommendations how to avoid dramatic performance loss on conditional statements? Or is it unavoidable?

  2. #2

    Re: Great performance loss on conditionals

    What you're seeing is quite likely on GPU hardware. It's not designed for code with complex branching and decision making. What seems weird is the fact you're seeing it on such a small branch which should use instruction predication just fine.

    Are you sure you're fully utilizing the GPU hardware? What is your global size and your work group size? Feel free to post some example code here as well.

  3. #3
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Great performance loss on conditionals

    I agree with coleb. It seems like you're running into problems with instruction divergence. (google opencl instruction divergence) However, it also seems strange that this should be so much of a problem since simple branches should be automatically predicated. One thing you can try is putting in a barrier after the conditional so the work-items resync at that point to avoid further instruction divergence in the execution. This might be particularly useful if you have a conditional branch at the beginning followed by a lot of computation.

  4. #4
    Junior Member
    Join Date
    Apr 2010
    Posts
    3

    Re: Great performance loss on conditionals

    Quote Originally Posted by coleb
    Are you sure you're fully utilizing the GPU hardware? What is your global size and your work group size?

    You were right!!!
    The problem was that I had not been using GPU in a right way, i.e. the work group size was zero.
    Correcting this by passing non-zero local_work_size to clEnqueNDRangeKernel() call (16 x 16 in my case) gave me what I needed.
    Now GeForce 9600 GT is faster than CPU (though not so much).

    Thanks a lot for a good idea!
    It was not that obvious from the OpenCL manual how to approach the problem.

  5. #5
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Great performance loss on conditionals

    That sounds a bit strange. A local work-group size of 0 should give an error. (NULL should be fine, as the driver will attempt to pick a good one.) I'd still suggest investigating why the local work-group size should have such an impact on the if statement.

  6. #6
    Junior Member
    Join Date
    Apr 2010
    Posts
    3

    Re: Great performance loss on conditionals

    Quote Originally Posted by dbs2
    That sounds a bit strange. A local work-group size of 0 should give an error. (NULL should be fine, as the driver will attempt to pick a good one.)
    I am sorry for my poor English!
    I meant exactly NULL pointer passed as local_work_size argument value.
    It seems that current OpenCL implementation by NVIDIA is not very good at splitting the global work-items into work-groups automatically.

    Slightly off-topic point (addressed to the forum administrators), is not it worth creating a sort of "OpenCL troubleshooting" sub-forum or just a single "always on top" thread in order to collect OpenCL programming mistakes and resolutions for them in a single place?

Similar Threads

  1. OpenCL Failing At Conditionals
    By Syndacate in forum OpenCL
    Replies: 9
    Last Post: 08-26-2012, 10:02 PM
  2. What is the expected performance?
    By tranders in forum OpenGL ES general technical discussions
    Replies: 1
    Last Post: 04-16-2004, 06:41 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •