Page 1 of 2 12 LastLast
Results 1 to 10 of 20

Thread: Running the same kernel on multiple devices

  1. #1
    Member
    Join Date
    Sep 2009
    Posts
    35

    Running the same kernel on multiple devices

    What is the general technique when you want more devices (in same context) to run the same kernel on same memory? How do you split the workload? (For example I want first device to calculate first half of the job, and second device second half of the job (in same memory).)

    global_work_offset parameter in clEnqueueNDRangeKernel would really be handy for that, but currently isn't supported.

    Thank you

  2. #2
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Running the same kernel on multiple devices

    The general approach should be to create separate cl_mem objects for each device, and dynamically size them based on the devices' performance. I would be a bit weary of using one cl_mem object and accessing it from two devices at once. While the CL model should allow sharing one cl_mem to work as long as you don't write the same data from both devices, you may end up copying all the data to both devices. This is because CL has no way of knowing how much of the data you are going to touch. Unfortunately this splitting isn't automated, and if you have a heterogeneous mix of devices you'll need to adjust how much each one processes to get the best throughput.

  3. #3
    Member
    Join Date
    Sep 2009
    Posts
    35

    Re: Running the same kernel on multiple devices

    I was afraid of that. I assume this also applies for SLI and Crossfire networked devices?

    For my problem, logical solution requires the readability of whole memory inside kernel. It's not fun to complicate things..
    Based on what should one decide how to slice the task and share the slices with device? It should be a function of devices type, compute units, and clock frequency?

    global_work_offset

    Thank you

  4. #4
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Running the same kernel on multiple devices

    I honestly have no idea how SLI and Crossfire work. For rendering my impression was they just alternate frames, which means all textures are replicated across both cards. I don't know what they do for compute.

    You can get a rough estimate of how to divide up the work based on the frequency and number of compute units for the device. However, the best thing to do is just divide it up as 1/n and measure the time over a few iterations and re-balance.

    I suspect the global offset will arrive in the future, but I don't know when. Until then you can always pass in an int4 to your kernel and then use:
    idx = get_global_id(0)=my_offset.x;
    and the like. This will add an additional add instruction, but will get you the global offset. You then just need to only enqueue a global size for the device equal to the portion it should process and you'll have the same thing.

  5. #5
    Member
    Join Date
    Sep 2009
    Posts
    35

    Re: Running the same kernel on multiple devices

    Thanks for the hints.
    I was thinking the same offset trick. I just hope OpenCL is smart enough to manage 1 read/write buffer on all devices. I promise every device will write on their own buffer spots.

  6. #6
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Running the same kernel on multiple devices

    OpenCL isn't smart enough to manage one read/write buffer across multiple devices. All you're guaranteed (and I'm not even sure that's the case since this behaviour might fall into the undefined category) is that if you enqueue a write then do a finish then you should read the same data on all devices that read the memory object. If you write back, there's no telling what you'll get. The implementation might take the copy on device A and ignore the copy on device B, so any writes done to a memory object shared by the two on device B would be lost. You are probably okay as long as you are just reading (although you may use more memory) but you have to call finish after loading data into the cl_mem as there is no implicit synchronization between devices. This behavior may also change between platforms, but calling clFinish() should be reasonably safe.

  7. #7
    Member
    Join Date
    Sep 2009
    Posts
    35

    Re: Running the same kernel on multiple devices

    Quote Originally Posted by dbs2
    If you write back, there's no telling what you'll get. The implementation might take the copy on device A and ignore the copy on device B, so any writes done to a memory object shared by the two on device B would be lost. You are probably okay as long as you are just reading ...
    Since I manage quasi-splitting (offsetting) by myself, it's not problem for me to enqueue read multiple times - for each command queue one read: the part only A was writing, and then the part B was writing - if that method will guarantee me right data.
    If that doesn't do the job, I'll just make output buffers for each device.

  8. #8
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Running the same kernel on multiple devices

    If you have two devices writing to the same cl_mem object at the same time, regardless of whether they write to unique sections, there is no guarantee in OpenCL that you will get the merged results in the end. You will have to use two cl_mem objects for writing. Reading should work better, but, again, this is pretty much "undefined" behavior if I understand correctly.

  9. #9
    Member
    Join Date
    Sep 2009
    Posts
    35

    Re: Running the same kernel on multiple devices

    So the proper way that devices will get the right read-only buffer data would be to use clEnqueueWrite and clFinish for all command queues (all devices) using the same cl_mem object.

    On more philosophical: would it be possible to clEnqueueWrite and clFinish on *only one* command queue, that OpenCL automatically prepares the data to share across every device in context?

    Contexts are atm useless if memory sharing within devices in same contexts is so hard.

  10. #10
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Running the same kernel on multiple devices

    The proper way that devices will get the right read-only buffer that they can use *at the same time* is to have two buffers. CL makes no guarantees about the consistency of data used by multiple devices at the same time. It will, however, ensure that if you use data on device in a context and then on another one *after the first one is done* that the data is up-to-date. So if you want to share data between two devices at the same time you should duplicate it, but you can try accessing it from both at the same time. The latter is not part of the spec so the behavior (while probably mostly what you expect) is (as far as I know) undefined. The details of this should be in the CL spec, but basically the only consistency guarantees are that non-simultaneous uses of data in a context will always have the most recent data on the device being used.

Page 1 of 2 12 LastLast

Similar Threads

  1. CreateBuffer for Multiple Devices.
    By Seshadri in forum OpenCL
    Replies: 5
    Last Post: 05-06-2011, 02:30 PM
  2. clEnqueueCopyBuffer and multiple devices
    By Banjobeni in forum OpenCL
    Replies: 7
    Last Post: 04-15-2011, 04:04 PM
  3. EnqueueWriteBuffer for multiple Devices
    By centershock in forum OpenCL
    Replies: 0
    Last Post: 03-30-2011, 07:55 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •