Results 1 to 8 of 8

Thread: clEnqueueCopyBuffer and multiple devices

  1. #1
    Junior Member
    Join Date
    Apr 2011
    Posts
    20

    clEnqueueCopyBuffer and multiple devices

    Hi

    I am struggling again with the relationship (or lack thereof) of buffers and devices.

    Asume I have a buffer that I write to a device previously. I'd like to copy this buffer to another device (both are gpus). Which queue should I use in the call to the clEnqueueCopyBuffer function so I can be sure that the data is present on the second device afterwards?

    thx

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: clEnqueueCopyBuffer and multiple devices

    Banjobeni,

    As discussed in the previous thread, buffer objects are associated with contexts, not with devices. You do not need to "move" buffer objects from one device to another.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Apr 2011
    Posts
    20

    Re: clEnqueueCopyBuffer and multiple devices

    Yes, I do. I am programming with multiple devices.

    I have a consistency model in place which is different from what OpenCL enforces.
    I have a data management engine that ensures data prefetching is done as soon as it is determined where an operation will run and that not-needed buffers are freed when I run out of space on a device.
    I have a scheduler in place that queues operations to devices in a HEFT (Heterogeneous Earliest Finish Time) model.

    I am depending on knowing where the data is at the moment. I do load balancing on my own and take data availability/movement time into account. Therefore I use one buffer per data per device. And again therefore I need to be able to copy between the devices.

    If you could just answer my question - which queue should I take in order to make sure that the data is actually copied to the correct device? The one on which the data already is or the one to which the data shall be copied?

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: clEnqueueCopyBuffer and multiple devices

    It seems there's a miscommunication between us.

    Even in the presence of multiple devices, OpenCL associates each buffer with a context, not with a particular device. In particular, there's no way to query in which device a particular buffer object resides -- in fact it may be in multiple devices at once if for example all devices are reading from it and none is writing.

    Hence, your question of "which queue should I take in order to make sure that the data is actually copied to the correct device?" has no meaning according to OpenCL. At best you may get answers of what particular implementations of OpenCL in particular devices will do.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5
    Junior Member
    Join Date
    Apr 2011
    Posts
    20

    Re: clEnqueueCopyBuffer and multiple devices

    David,

    I do understand your point. But, given the above scenario, there has to be a quidance which queue to use. If the guidance says it does not matter - that's fine. But this is a clearly phrased question and there must be a clear answer.

    Please have a look at the following code. It is written in C# but it maps one to one to OpenCL calls and the mapping should be clear. There is a short summary beneath.

    Code :
        static class Program {
     
            static readonly String Kernel = "kernel void Kernel(global float * b) { b[get_global_id(0)]++; }";
     
            static readonly Int64 Count = 1024 * 1024 * 64;
     
            static void Main(String[] arguments) {
                Platform platform = Platform.GetPlatforms()[0];
                Device[] devices = platform.GetDevices(DeviceType.Gpu);
                Device device0 = devices[0];
                Device device1 = devices[1];
                Context context = new Context(devices);
                CommandQueue queue0 = new CommandQueue(context, device0, CommandQueueFlags.None);
                CommandQueue queue1 = new CommandQueue(context, device1, CommandQueueFlags.None);
                Single[] array = new Single[Count];
                Buffer buffer = context.CreateBuffer((UInt64)System.Buffer.ByteLength(array), BufferFlags.ReadWrite);
                CompiledProgram program = context.Compile(Kernel, null);
                Kernel kernel = program.CreateKernel("Kernel");
                kernel.SetGlobalArgument(0, buffer);
                queue1.WriteBuffer(array, buffer);
                EventObject e1 = queue0.StartKernel(kernel, new Int64[] { Count / sizeof(Single) }, null);
                EventObject e2 = queue1.StartKernel(kernel, new Int64[] { Count / sizeof(Single) }, null, e1);
                EventObject e3 = queue1.StartReadBuffer(buffer, array, e2);
                context.WaitForEvents(e3);
            }
        }

    In short, it does:
    • Create a context over two devices[/*:m:bva862pv]
    • Create a kernel which, when run, increments a bunch of floats[/*:m:bva862pv]
    • Create a buffer that holds the floats[/*:m:bva862pv]
    • Move data to the buffer[/*:m:bva862pv]
    • Run the kernel on the first device[/*:m:bva862pv]
    • Run the kernel on the second device[/*:m:bva862pv]
    • Read back the results into host memory[/*:m:bva862pv]


    I'd expect this increments all values by two. It does, however, only increment them by one. So either one of the two kernel calls does not get the correct buffer or when reading the buffer it actually reads stale data.

    Can you explain this according to the OpenCL 1.1 specification?

  6. #6
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: clEnqueueCopyBuffer and multiple devices

    I do understand your point. But, given the above scenario, there has to be a quidance which queue to use. If the guidance says it does not matter - that's fine. But this is a clearly phrased question and there must be a clear answer.
    It does not matter which queue you use. In both cases the memory object will be made available to all devices in the context -- because memory objects are associated with contexts, not with devices.

    Your example application should indeed increase the contents of the buffer by two. Either there's a bug in the application that I haven't noticed, or there's a bug in the C# bindings or in the OpenCL implementation that you are using.

    Is it possible that "queue1.WriteBuffer(array, buffer);" is performing a non-blocking write? Or that any of the operations is returning an error?

    The spec defines memory consistency between devices in terms of "synchronization points". If you search the spec for the words "synchronization point" you will find all the related language. As you will find, event dependencies act as synchronization points so your application should work as intended.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  7. #7
    Junior Member
    Join Date
    Apr 2011
    Posts
    20

    Re: clEnqueueCopyBuffer and multiple devices

    Thanks a lot. I re-read the details about synchronization points and I came to the conclusion that there must be a bug in the ATI implementation.

    Out of curiosity I implemented the same functionality using a different OpenCL wrapper (Cloo). To my complete surprise, it worked. I then migrated step-by-step the version that works into my original version that suffers from this bug. I realized that by accident in the Cloo version I used also the CPU device in addition to the two GPU devices. That is, the only difference was that the CPU device was also part of the context.

    I come to the conclusion that the OpenCL API includes a bug that compromises buffer consistency across multiple GPU devices when only GPU devices are present in the context.

    Where can I report this issue?

  8. #8
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: clEnqueueCopyBuffer and multiple devices

    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Replies: 0
    Last Post: 08-06-2012, 09:47 AM
  2. EnqueueWriteBuffer for multiple Devices
    By centershock in forum OpenCL
    Replies: 0
    Last Post: 03-30-2011, 07:55 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •