Results 1 to 5 of 5

Thread: Overhead of Passing the same buffer to different kernels

  1. #1
    Junior Member
    Join Date
    Feb 2014
    Posts
    12

    Question Overhead of Passing the same buffer to different kernels

    Hi, I'm a beginner in OpenCL and I have a (maybe) naive question. In my use case I have two kernels which are queued sequentially; both of them need the same buffer object as input argument (amongst other arguments), and I'm worrying about the overhead of transferring it to the GPU. Currently I do semething like this:
    Code :
    // Create cl::Buffer objects
    // Then:
    kernel1.setArg(0, thatBuffer);
    kernel1.setArg(1, bufferForKernel1);
    queue.enqueueNDRangeKernel(kernel1, ...)
    kernel2.setArg(0, thatBuffer);
    kernel2.setArg(1, bufferForKernel2);
    queue.enqueueNDRangeKernel(kernel2, ...)
    My hope is that doing this way thatBuffer is transferred only once to the GPU and consumed by both kernels, but I fear that both the setArg calls might trigger a data transfer to the GPU. If so, how to optimize the data transfer to avoid unnecessary overhead?
    Thanks.

  2. #2
    Senior Member
    Join Date
    Dec 2011
    Posts
    126
    Buffers and Images (and other cl_mem objects on newer versions of OpenCL) passed to kernels are just handles to the memory object. Therefore they are very fast and you can use the same Buffer in multiple kernels with no overhead.

    The actual copying of the contents of the buffer happens with the clEnqueueWriteBuffer / clEnqueueReadBuffer / clEnqueueWriteImage / clEnqueueReadImage / clEnqueueMapBuffer / clEnqueueMapImage / clEnqueueUnmapMemObject commands. You want to minimize those.

    If you create and write to your buffer then process a bunch of kernels on it, it is resident on the GPU during all of the kernel execution. Then you copy back the buffer with the results.

  3. #3
    Junior Member
    Join Date
    Feb 2014
    Posts
    12
    Thanks Dithermaster. But now I'm a bit confused: I don't explicitly enqueue any buffer copy to the GPU. I simply create cl::Buffer objects (using the C++ wrapper API), then set kernel arguments calling cl::Kernel::setArg, then enqueue the first kernel for execution with cl::Queue::enqueueNDRangeKernel. It actually works so somewhere under the hood the buffer copy from host memory to GPU happens, but it's not clear to me when and where. That's why I asked the question. Your answer makes sense but does not fit to my use case...

  4. #4
    The initial data transfer could be part of buffer creation, depending on how you created your buffer. If you created it using the CL_MEM_USE_HOST_PTR flag, then on some architectures, you may not even have a copy at all at any point. On other architectures, or if you create your buffer with CL_MEM_COPY_HOST_PTR, you would have an initial copy in the context of the clCreateBuffer call.

    But typically, buffers used in two consecutive kernels don't require additional copies, as Dithermaster explained.

  5. #5
    Junior Member
    Join Date
    Feb 2014
    Posts
    12
    Thanks to everybody, it is clear now.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •