Results 1 to 3 of 3

Thread: Looping kernels produce not constant timings

  1. #1
    Join Date
    Mar 2014

    Looping kernels produce not constant timings

    Hi OpenCL community,

    I would appreciate if any of you can help me with the following issue. I have a program in which I use the same kernels over and over inside a "for" loop. The pseudo-code of my program is the following

    Code :
    Initilialize OpenCL (devices, queue, kernels, create buffers, set arguments, etc)
       read data
       rewrite buffers with CL_TRUE enabled
       run kernel 1
       run kernel n
      read output buffer  
      C functions using the output

    Where tic and toc are time measurement functions similar to Matlab which I use to profile the performance of my code. I am not using the OpenCL profiler functions because I am working with the Nexus10 and they are not working.properly.

    My question is the following:
    When I plot the times for all the running kernels, I observe that there are iterations in which they are not relatively constant (it starts at some timing value and then randomly jump to a higher time for some iterations and then it goes back to a time that is between the min (expected one) and the maximum) as it should be. Do anyone have a hint of what may be causing this?.

    I tried changing the clFinish with clFlush, using both or none. Also, when I run only one iteration of the process with the same input that produces the maximum value it works fine producing the minimum expected time. Finally, if I add a sleep(100ms) at the end of the loop the times are constant (at the minimum value) for all the kernels as they should be.

    Thanks for your time and advise.

    Last edited by lc.carrilloarce; 03-07-2014 at 03:24 PM.

  2. #2
    Senior Member
    Join Date
    Dec 2011
    It could be other GPU operations are getting "caught" by your clFinish and you're also timing those. Things like OpenGL drawing your screen. Try creating an OpenCL 'event' for each kernel and get the profiling stats from those events to measure the execution time of the kernels themselves. Are those more consistent? You could also use vendor tools to measure kernel performance (e.g, NVIDIA Parallel Nsight, AMD APP Profiler). Note: make sure to clReleaseEvent each event after you get the stats you need. Also note: In order to get high performance OpenCL code, you shouldn't be calling clFinish. Just queue up work and read back results (with blocking reads).

  3. #3
    Another source of this kind of behavior could be dynamic frequency adjustment in the GPU. As it heats up, it could be reducing the GPU frequency. As it cools, it could raise the frequency again. You look for ways to query the dynamic GPU frequency to confirm this.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts