Results 1 to 3 of 3

Thread: Looping kernels produce not constant timings

  1. #1
    Newbie
    Join Date
    Mar 2014
    Posts
    3

    Looping kernels produce not constant timings

    Hi OpenCL community,

    I would appreciate if any of you can help me with the following issue. I have a program in which I use the same kernels over and over inside a "for" loop. The pseudo-code of my program is the following

    Code :
    Initilialize OpenCL (devices, queue, kernels, create buffers, set arguments, etc)
     
    for[parameters]{
     
       read data
     
       tic()
       rewrite buffers with CL_TRUE enabled
       toc()
     
       tic()
       run kernel 1
       clFinish()
       toc()
     
      ...
     
       tic()
       run kernel n
       clFinish()
       toc()   
     
      tic()
      read output buffer  
      toc()
     
      tic()
      C functions using the output
      toc()
     
    }

    Where tic and toc are time measurement functions similar to Matlab which I use to profile the performance of my code. I am not using the OpenCL profiler functions because I am working with the Nexus10 and they are not working.properly.

    My question is the following:
    When I plot the times for all the running kernels, I observe that there are iterations in which they are not relatively constant (it starts at some timing value and then randomly jump to a higher time for some iterations and then it goes back to a time that is between the min (expected one) and the maximum) as it should be. Do anyone have a hint of what may be causing this?.

    I tried changing the clFinish with clFlush, using both or none. Also, when I run only one iteration of the process with the same input that produces the maximum value it works fine producing the minimum expected time. Finally, if I add a sleep(100ms) at the end of the loop the times are constant (at the minimum value) for all the kernels as they should be.

    Thanks for your time and advise.

    LC
    Last edited by lc.carrilloarce; 03-07-2014 at 03:24 PM.

  2. #2
    Senior Member
    Join Date
    Dec 2011
    Posts
    163
    It could be other GPU operations are getting "caught" by your clFinish and you're also timing those. Things like OpenGL drawing your screen. Try creating an OpenCL 'event' for each kernel and get the profiling stats from those events to measure the execution time of the kernels themselves. Are those more consistent? You could also use vendor tools to measure kernel performance (e.g, NVIDIA Parallel Nsight, AMD APP Profiler). Note: make sure to clReleaseEvent each event after you get the stats you need. Also note: In order to get high performance OpenCL code, you shouldn't be calling clFinish. Just queue up work and read back results (with blocking reads).

  3. #3
    Another source of this kind of behavior could be dynamic frequency adjustment in the GPU. As it heats up, it could be reducing the GPU frequency. As it cools, it could raise the frequency again. You look for ways to query the dynamic GPU frequency to confirm this.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •