Results 1 to 6 of 6

Thread: CreateBuffer for Multiple Devices.

  1. #1
    Junior Member
    Join Date
    Mar 2011
    Posts
    2

    CreateBuffer for Multiple Devices.

    Hi,

    I want to create a OpenCL context, upon which I would want to interface multiple OpenCL devices.

    As the CreateBuffer API accepts only <OpenCL context> as its parameter, and there are no device related parameters like Command Queue IDs or Device IDs provided, I am just wondering how will the CreateBuffer method work in multiple device scenario? In other words, if a context is associated with multiple devices, for which device will the CreateBuffer API allocate a memory?

    Also the Appendix 1.0 of OpenCL spec says, the memory objects created using the context are shared across multiple command queues, and hence multiple devices.

    So it is not clear to me, how the CreateBuffer work in a context where multiple devices are associated? In other words, it is not clear to me, for which device (among those multiple devices) the cl_mem object gets allocated?

    Can any one throw some light in this issue?

    Thanks
    Seshadri

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: CreateBuffer for Multiple Devices.

    I am just wondering how will the CreateBuffer method work in multiple device scenario? In other words, if a context is associated with multiple devices, for which device will the CreateBuffer API allocate a memory?
    The OpenCL driver will take care of that. From the point of view of the application, the buffer is shared among all devices in that context.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Mar 2011
    Posts
    2

    Re: CreateBuffer for Multiple Devices.

    Thanks David for prompt response.

    Just want to add some of my observations in the meantime. I observed this, after I searched the internet after I posted the mail.

    Its not defined clearly in OpenCL specific, hence it is implementation specific, as confirmed in NVIDIA forum.

    http://forums.nvidia.com/index.php?show ... try1192390

    This is how it works in NVIDIA devices.

    “On NVIDIA GPUs the actual memory to hold the buffer in device memory is not allocated until the device is specifically addressed to use the data. For read-only buffers, this would be when a clEnqueueWrite* command is issued to that device's command-queue. For write-only buffers, this is even trickier. The actual allocation will take place on the first execution of a kernel, of which the buffer was set as an argument of, or at the first call to clEnqueueRead* command for that buffer on a command queue associated with the device”

    So we can simply assume, as if the allocation never takes place at the time of CreateBuffer in NVIDIA GPUs.

    “OpenCL does not assume that data can be transferred directly between devices within the same context, so such a behavior is implementation specific. Technically, you need to explicitly transfer the data from one device to the other, by issuing a clEnqueueRead* command on the command queue attached with the 1st device, and then a synchronized clEnqueueWrite* command on the command queue of the 2nd device. This off course transfers data through the host. The same cl_mem object is used in both commands.”

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: CreateBuffer for Multiple Devices.

    “OpenCL does not assume that data can be transferred directly between devices within the same context, so such a behavior is implementation specific. Technically, you need to explicitly transfer the data from one device to the other, by issuing a clEnqueueRead* command on the command queue attached with the 1st device, and then a synchronized clEnqueueWrite* command on the command queue of the 2nd device. This off course transfers data through the host. The same cl_mem object is used in both commands.”
    The person who wrote the quote above is unfortunately mistaken. It's the driver's responsibility to transfer data transparently from different devices within a context (if necessary). Memory objects are available to all devices in the same context as if they were shared.

    If you need further reassurance I can try to summon the OpenCL spec editor, but I'd rather not bother him.

    This other quote, however, is true to the extent that I know:

    “On NVIDIA GPUs the actual memory to hold the buffer in device memory is not allocated until the device is specifically addressed to use the data. For read-only buffers, this would be when a clEnqueueWrite* command is issued to that device's command-queue. For write-only buffers, this is even trickier. The actual allocation will take place on the first execution of a kernel, of which the buffer was set as an argument of, or at the first call to clEnqueueRead* command for that buffer on a command queue associated with the device”
    This behavior is allowed in the specification.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5
    Junior Member
    Join Date
    Jan 2010
    Posts
    2

    Re: CreateBuffer for Multiple Devices.

    Quote Originally Posted by david.garcia
    The person who wrote the quote above is unfortunately mistaken. It's the driver's responsibility to transfer data transparently from different devices within a context (if necessary). Memory objects are available to all devices in the same context as if they were shared.
    Here is my scenario:
    - Dataset too big for 1 device
    - 4 devices
    - Dataset split into 4 buffers in a 3D matrix composition
    - Border data must be exchanged between the devices in each iteration.

    How do I do that? What commands should I use?

    Should I use enqueueWrite/Read/Copy between the buffers?

    Should I have some small extra border buffers and extra kernels to copy data from the border buffers to the main buffer? Would the extra border buffers automagically be copied from one device to the other if I run a write kernel on one device and a read kernel on another?

    I have posted this question in multiple forums, but I am not getting any clear answers.

    - Atle

  6. #6
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: CreateBuffer for Multiple Devices.

    Should I use enqueueWrite/Read/Copy between the buffers?

    Should I have some small extra border buffers and extra kernels to copy data from the border buffers to the main buffer? Would the extra border buffers automagically be copied from one device to the other if I run a write kernel on one device and a read kernel on another?
    Both are valid solutions. I would probably use clEnqueueCopyBuffer() to propagate border information across devices since that probably makes the kernel source code more readable. I don't think there would be a big performance difference between the two solutions you suggest.

    This is not really anything special about OpenCL if you think about it. It's the same problem you would need to solve if you were programming in C or MPI.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Replies: 0
    Last Post: 08-06-2012, 09:47 AM
  2. EnqueueWriteBuffer for multiple Devices
    By centershock in forum OpenCL
    Replies: 0
    Last Post: 03-30-2011, 07:55 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •