Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: Non-blocking Write Always Completes Without Error

  1. #1
    Junior Member
    Join Date
    Apr 2011
    Posts
    20

    Non-blocking Write Always Completes Without Error

    Hi

    This is my first post in this forum. Hence, if I chose the wrong category, please be so kind and move this post where it belongs to.

    I am playing around with the non-blocking use of the various enqueue calls. I must say I really like the way the OpenCL Wait Objects are set up.

    I came across one issue, though, that I couldn't explain. I don't think it makes sense to show my code here because it uses a self-made cloo-alike .NET wrapper around the OpenCL functions.

    I logically do the following:
    • Create a context for one device[/*:m:c0dyujol]
    • Create a command queue[/*:m:c0dyujol]
    • In an endless loop, I do:
      [list:c0dyujol]
    • Allocate a large chunk of memory on the host[/*:m:c0dyujol]
    • Create a new buffer with the same size[/*:m:c0dyujol]
    • Enqueue a non-blocking copy from the host memory to the buffer (clEnqueueWriteBuffer)[/*:m:c0dyujol]
    • put the memory, buffer and returned event in a list so I can still reference them afterwards[/*:m:c0dyujol]

    until either OpenCL returns an error (synchronously) or I run our of host memory.[/*:m:c0dyujol][/list:u:c0dyujol]

    When I execute this, I always run out of host memory first. It seems I can move much more data to the device (a GPU) than it actually has memory to store it.

    Up until now, everything's fine. This was expected, these are non-blocking calls, so they won't error immediately or will just wait until there is enough space. And now comes the Issue: When I check the execution status (clGetEventInfo), all the events are CL_COMPLETE.

    From the spec I would have expected the execution status of the later events to be one of two options:
    • CL_QUEUED to indicate the operation is queued but cannot run because there is not enough space available on the gpu[/*:m:c0dyujol]
    • An out of memory error code to indicate that there is not enough space available on the device[/*:m:c0dyujol]


    Is what I observe the correct behaviour? Where does the overflowing data live, then?

    Any explanation is highly appreciated.

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Non-blocking Write Always Completes Without Error

    Wow, this is interesting. What implementation of OpenCL are you using (AMD/Nvidia)? What's your hardware?

    A sufficiently clever driver running on hardware that supports mapping host memory into the device's address space may simply be reusing the pointer you passed to clEnqueueWriteBuffer as device memory. Even though this should be possible (at least in some cases) it would require some trickery with the OS to make it all work.

    Have you verified that the data stored in the buffer objects in fact has the same contents as the host memory?

    When you clCreateBuffer(), what arguments do you pass as read/write flags? Are they read-only by any chance?

    My hat goes off to the folks that wrote that driver if this is all true.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Apr 2011
    Posts
    20

    Re: Non-blocking Write Always Completes Without Error

    Hey, thanks for your answer!

    I am using NVidias implementation on my laptop which features a Quadro FX 880M.

    I did not verify data integrity yet. I will later today.

    What would you have expected: out of memory or cl_queued? Depending on this information I will double check my wrapper.

    I do pass ReadWrite as flags to the buffer creation.

    Thanks again.

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Non-blocking Write Always Completes Without Error

    What would you have expected: out of memory or cl_queued? Depending on this information I will double check my wrapper.
    I would have expected out of memory when clEnqueueWriteBuffer() is called since NVidia's implementation defers memory allocations until the point where a buffer is used.

    Also make sure to pass a pfn_notify function when you call clCreateContext(); some errors can only be returned through that callback.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5
    Junior Member
    Join Date
    Apr 2011
    Posts
    20

    Re: Non-blocking Write Always Completes Without Error

    Quote Originally Posted by david.garcia
    I would have expected out of memory when clEnqueueWriteBuffer() is called since NVidia's implementation defers memory allocations until the point where a buffer is used.
    Isn't lazy memory allocation a product of the fact that buffers do not live on a device by default? At least I don't see any way to specify which device the buffer is intended to live on when creating it.

    How does ATI differ here? I only develop sometimes on my laptop, in my desktop PC there are two strong ATI cards.

    Quote Originally Posted by david.garcia
    Also make sure to pass a pfn_notify function when you call clCreateContext(); some errors can only be returned through that callback.
    That is a good idea. I'll try this.

    By the way, when switching from non-blocking to blocking write I get an out of memory return code after a predictable number of iterations. I suspect a bug.

    Assuming that NVidia really does some trickery and uses the host pointer: I strongly dislike it. OpenCL is a very low level abstraction around compute platforms. Not being able to precisely specify when allocations occur on the device and more importantly when data is being copied will lead to decreased performance in specific situations, and there is no way for the implementation to overcome this without introducing overhead again.

  6. #6
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Non-blocking Write Always Completes Without Error

    Isn't lazy memory allocation a product of the fact that buffers do not live on a device by default?
    Yes, you got that right

    At least I don't see any way to specify which device the buffer is intended to live on when creating it.
    Correct.

    How does ATI differ here? I only develop sometimes on my laptop, in my desktop PC there are two strong ATI cards.
    I imagine they do the same, but I haven't checked to be honest.

    By the way, when switching from non-blocking to blocking write I get an out of memory return code after a predictable number of iterations. I suspect a bug.
    Interesting. What are these iterations doing? Couldn't this error simply be a consequence of lazy memory allocation? After all, if the command is non-blocking and there's not enough memory, the driver could simply wait a bit and try later. This sounds much more feasible than being able to reuse host memory, particularly since you mentioned your buffers are read/write.

    Assuming that NVidia really does some trickery and uses the host pointer: I strongly dislike it. OpenCL is a very low level abstraction around compute platforms. Not being able to precisely specify when allocations occur on the device and more importantly when data is being copied will lead to decreased performance in specific situations, and there is no way for the implementation to overcome this without introducing overhead again.
    OpenCL is not that low level, whether for good or for bad. Specifically, the driver handles all necessary memory transfers between devices and between host and devices transparently to the application. Some have argued that buffers should be explicitly associated with specific devices and applications should be required to explicitly transfer buffer ownership from one device to another. I don't remember why the committee chose the alternative we have today -- I'm sure there are compelling reasons for it. In any case, at this point we have to work with what we have.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  7. #7
    Junior Member
    Join Date
    Apr 2011
    Posts
    20

    Re: Non-blocking Write Always Completes Without Error

    Quote Originally Posted by david.garcia
    Interesting. What are these iterations doing?
    Please refer to my original post. One iteration consists of allocating memory on the host, creating a buffer, and moving data from the allocated host memory to the created buffer.


    Quote Originally Posted by david.garcia
    Couldn't this error simply be a consequence of lazy memory allocation?
    For the blocking call, sure. The error is totally expected. In the case of the non-blocking call, however, there is no error. Each of the event objects' status is set to CL_COMPLETE, not a single one of them fails. And that very fact is what concerns me.

    Quote Originally Posted by david.garcia
    After all, if the command is non-blocking and there's not enough memory, the driver could simply wait a bit and try later.
    Exactly. Queue the operation and only carry it out when it's possible, that is, when there's enough memory available to actually do it.

    Quote Originally Posted by david.garcia
    This sounds much more feasible than being able to reuse host memory, particularly since you mentioned your buffers are read/write.
    I have no idea how using the host pointer actually works. I try to avoid it in my work therefore.

    Quote Originally Posted by david.garcia
    OpenCL is not that low level, whether for good or for bad.
    Even though OpenCL tries to be on a higher level, it certainly is not. Just look at stuff like host memory pointers, mappings and the like.

    In my opinion, OpenCL combines the disadvantages of both worlds here: Memory spaces are explicit, while data movement is somehow hidden and up to the driver and therefore too transparent.

  8. #8
    Junior Member
    Join Date
    Apr 2011
    Posts
    20

    Re: Non-blocking Write Always Completes Without Error

    Hey David

    I implemented the context notification callback in my wrapper now and it gets called: "CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_WRITE_BUFFER on Quadro FX 880M (Device 0)."

    At least there is some sort of notification now. But there is no way to actually relate the message to a specific operation/wait object.

    I then tried executing the very same code on my desktop machine (2xATI HD6970) and here it behaves more as expected: clEnqueueWriteBuffer synchronously returns out of memory after the expected number of iterations. I would have liked it much more if the call succeeded and the wait handle execution status would contain the error, but it's still much better than what NVidia seems to be doing.

    On a further note, I realized that my laptop GPU only supports OpenCL 1.0. Maybe what I observed does only hold for their 1.0 implementation. I don't have a NVidia GPU capable of 1.1 lying around, otherwise I'd have tested and possibly reported the issue.

    There is one remaining issue, though. I can now correctly allocate and write to 4 Buffers of 256MB each on one device. But the last write takes tremendously much longer to finish. When using the built-in profiling facility, the difference between finish and submit time is nearly constant for the first 3 calls at around 0.25 secs, and then increases to 1.5 secs for the fourth write operation. It then fails when trying to allocate and write to a fifth buffer, which is expected.

    Do you have any explanation for the decrease in performance for the last successful operation?

  9. #9
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Non-blocking Write Always Completes Without Error

    I can now correctly allocate and write to 4 Buffers of 256MB each on one device. But the last write takes tremendously much longer to finish. When using the built-in profiling facility, the difference between finish and submit time is nearly constant for the first 3 calls at around 0.25 secs, and then increases to 1.5 secs for the fourth write operation. It then fails when trying to allocate and write to a fifth buffer, which is expected.
    Your GPU has 1GB of graphics memory, right? 4x256MB = 1GB. These buffers would consume all of the physically available memory. It looks like the driver is doing all it can to fit them in there, including swapping out whatever other internal buffers they have. I'm actually surprised that the fourth allocation succeeded, even if it took a bit longer.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  10. #10
    Junior Member
    Join Date
    Apr 2011
    Posts
    20

    Re: Non-blocking Write Always Completes Without Error

    Quote Originally Posted by david.garcia
    Your GPU has 1GB of graphics memory, right? 4x256MB = 1GB. These buffers would consume all of the physically available memory. It looks like the driver is doing all it can to fit them in there, including swapping out whatever other internal buffers they have. I'm actually surprised that the fourth allocation succeeded, even if it took a bit longer.
    It has 2GB of memory, of which OpenCL exposes 1GB as global memory.

    I am programming a part of a compiler that is responsible for offloading some computations to GPU when the operation is assumed to run faster on GPU. I am therefore dependent on predictions as to how long data movement will take.

    Is there any way to check how much space is left on a GPU? Or tell OpenCL not to offload memory already allocated for other purposes?

Page 1 of 2 12 LastLast

Similar Threads

  1. Replies: 5
    Last Post: 08-09-2011, 06:36 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •