Results 1 to 5 of 5

Thread: How expensive is calling clEnqueueNDRangeKernel?

  1. #1
    Junior Member
    Join Date
    May 2010
    Posts
    2

    How expensive is calling clEnqueueNDRangeKernel?

    I have a function Run() that calls execution of two kernels:

    QUOTE
    void Run()
    {
    //I'am using C++ bindings
    queue->enqueueNDRangeKernel(*kernelRow, cl::NullRange, *globalRangeRow, *localRangeRow, NULL, eventRow);
    queue->enqueueNDRangeKernel(*kernelColumn, cl::NullRange, *globalRangeCol, *localRangeCol, NULL, eventCol);
    queue.finish()
    }


    // As you see, I'm using events (eventRow, eventCol) because of profiling.

    How expensive (time performance) is calling enqueueNDRangeKernel (or clEnqueueNDRangeKernel ).

    With Nvidia OpenCL Profiler, I got total time of execution (on GPU) 351 ms, but when I measured time of running of method Run()
    I got 622 ms.

    Why this difference is so large?

    I tested on NVIDIA GT240.
    I also tested on ATI HD 5670 and difference is much smaller.

    When is data transfered to GPU, on calling clEnqueueNDRangeKernel or when buffer is created (clCreateBuffer)?

  2. #2
    Member
    Join Date
    Nov 2009
    Location
    Scotland
    Posts
    72

    Re: How expensive is calling clEnqueueNDRangeKernel?

    The overhead of calling clEnqueueNDRangeKernel should be fairly small.
    I guess the problem is the data transfer. If you use clCreateBuffer with CL_MEM_COPY_HOST_PTR, the data only gets copied to the device when you call clEnqueueNDRangeKernel, because only then does the runtime know which device is using the data. Try using clEnqueueWriteBuffer to copy data to your device and see if it makes a difference.

  3. #3

    Re: How expensive is calling clEnqueueNDRangeKernel?

    There may also be some extra overhead associated with the first launch of a kernel. You should measure several kernel launches and then average the results.

  4. #4
    Junior Member
    Join Date
    May 2010
    Posts
    5

    Re: How expensive is calling clEnqueueNDRangeKernel?

    There is a noticeable overhead that scales with the size of the buffers pointed to in the kernel arguments associated with invoking enqueueNDRangeKernel() for the first time even if they've already been written to the device, at least when using Apple's implementation in Snow Leopard. From what I've discovered this can be alleviated by invoking a dummy kernel (i.e one with no instructions) with the same arguments before running the actual kernel. Or alternatively just invoking the original kernel repeatedly (of course this will take longer).

    EDIT: minor grammatical changes.

  5. #5
    Junior Member
    Join Date
    May 2010
    Posts
    2

    Re: How expensive is calling clEnqueueNDRangeKernel?

    Thanks Barneybear, you're right.
    I solved my problem, simply invoking a dummy kernel.

Similar Threads

  1. compileWithBinnaries and calling Kernels
    By luizdrumond in forum OpenCL
    Replies: 1
    Last Post: 11-28-2011, 04:14 PM
  2. compileWithBinnaries and calling Kernels
    By luizdrumond in forum OpenVG and VGU
    Replies: 0
    Last Post: 11-24-2011, 12:40 PM
  3. Calling obsolete TexImage2d
    By TGlad in forum Developers Coding:Beginner
    Replies: 2
    Last Post: 07-21-2010, 02:46 AM
  4. Windows Calling Convention
    By Ivo Moravec in forum OpenVG and VGU
    Replies: 1
    Last Post: 08-17-2007, 06:41 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •