Results 1 to 4 of 4

Thread: OpenCL Kernel Performance bad (vs CPU)

  1. #1
    Junior Member
    Join Date
    Oct 2013

    OpenCL Kernel Performance bad (vs CPU)


    I'm writing an Ant-Simulation.
    The Kernel Performance is very bad. In comparsion to standard c++ solution it has a big performance disadvantage.

    I dont understand why. The operations in the kernel are mostly without control structures (like if/else).


    I made a benchmark, and the OpenCL Kernel Performance is very bad.
    (Left Axis: Execution time in ms, Bottom Axis: number of simulated Ants)

    Can you give me advice?

    You can find the hole code in the git repo, if you are interested (the OpenCL stuff is happening here:

    Last edited by Furtano; 04-09-2014 at 08:14 AM.

  2. #2
    Senior Member
    Join Date
    Oct 2012
    Your kernels could be optimized, but the most important parameter when using a GPU is the local work size.

    NVIDIA GPUs for instance are optimized for a local work size of 128, so you should try again with an explicit local work size (and the global work size a multiple of the local work size of course).

  3. #3
    Junior Member
    Join Date
    Dec 2010
    Not every use case is suitable for GPU. Your kernel has lots of divergent branches which are generally bad for GPU.

  4. #4
    Join Date
    Oct 2010
    Vancouver, Canada
    One thing I notice is that you are reading back several buffers and then writing them again. All this data transfer in/out of the cl_mem buffer objects is going to carry a substantial performance penalty. You want to minimize memory traffic wherever possible, and if you don't need something on the host between kernel calls, don't copy it back.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts