Results 1 to 8 of 8

Thread: memory buffer question

  1. #1

    memory buffer question

    Hi,

    I am learning openCl and I noticed that most applications I have seen so far mostly create the memory buffer with clCreateBuffer and then use clEnqueueWriteBuffer to get the data in. Now I was wondering why do they not put it straight to the clCreateBuffer but do it in 2 steps??? is there a performance benefit or something?

    Thanks in advance,

    kind regards,

    Tim

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: memory buffer question

    Now I was wondering why do they not put it straight to the clCreateBuffer but do it in 2 steps??? is there a performance benefit or something?
    No, there's no performance benefit in doing this in two steps. You can call clCreateBuffer() with the CL_MEM_COPY_HOST_PTR flag and do both creation and copy in one step.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3

    Re: memory buffer question

    thanks for your reply, that was indeed what i though but was not sure ...

    another question though: when i have the following setup:

    kernel1 + command queue1 + buffer1 + buffer2 + event1
    kernel2 + command queue1 + buffer2 + buffer3 + wait for event1

    by using the event1 to sync between 2 kernels where the second uses the output from the first, we have memory consistency within the device. is this correct?

  4. #4
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: memory buffer question

    Quote Originally Posted by t.verstraete
    thanks for your reply, that was indeed what i though but was not sure ...

    another question though: when i have the following setup:

    kernel1 + command queue1 + buffer1 + buffer2 + event1
    kernel2 + command queue1 + buffer2 + buffer3 + wait for event1

    by using the event1 to sync between 2 kernels where the second uses the output from the first, we have memory consistency within the device. is this correct?
    That should work. But if you're using a default in-order queue, then it is also unnecessary.

    Event synchronisation becomes necessary when using multiple queues, multiple devices, or out-of-order queues.

  5. #5
    Junior Member
    Join Date
    Mar 2012
    Posts
    3

    Re: memory buffer question

    Quote Originally Posted by notzed
    That should work. But if you're using a default in-order queue, then it is also unnecessary.

    Event synchronisation becomes necessary when using multiple queues, multiple devices, or out-of-order queues.
    I have some questions related to this when 2 devices are used:

    1- If he had multiple devices, and kernel1 was ran on device1 and kernel2 was to be run on device2. If he uses CL_MEM_COPY_HOST_PTR, does the buffer1 get copied to device2 too?

    2- The code is waiting for event1 which is tied to kernel1 execution. When kernel1 execution finishes, the resulting buffer2 is guaranteed to be available to device2 right away?

    3- After the start of kernel1, but before the finishing of it, a host thread changes contents of the host memory where buffer3 is located. Do we have to enqueue a write before running kernel2?

    4- If multiple devices are used with multiple queues, when reading buffer3 back to host memory, does it matter which queue is used? (we wait for kernel2 events to finish).

    Thanks!

  6. #6
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: memory buffer question

    All but 4 are simple reads of the specification (and are yes, assuming you stick to the rules and do it properly).

    For 3, if buffer3 is used by kernel1 then you're breaking the rules.

    For 4, try the archives, from memory it shouldn't matter for valid data results (assuming all the synchronisation is correct): but it may affect performance. Intuitively the last device to write to the data will be the best one to read it from ... and I would have an expectation that that would at least be a reasonable way to approach it.

  7. #7
    Junior Member
    Join Date
    Mar 2012
    Posts
    3

    Re: memory buffer question

    Quote Originally Posted by notzed
    All but 4 are simple reads of the specification (and are yes, assuming you stick to the rules and do it properly).
    It is not very clear to me, or I couldnt find where this is clearly explained (dont say appendix because I have been there )

    Also,

    Quote Originally Posted by notzed
    For 3, if buffer3 is used by kernel1 then you're breaking the rules.
    No, it is only used by kernel2, but the question boils down to exactly when the buffer is copied to device, after creating the buffer, it might or not be copied to device right? So safest bet would be using an enqueuewrite if the data was changed after creation of buffer even though that buffer was not used yet?


    Quote Originally Posted by notzed
    For 4, try the archives, from memory it shouldn't matter for valid data results (assuming all the synchronisation is correct): but it may affect performance. Intuitively the last device to write to the data will be the best one to read it from ... and I would have an expectation that that would at least be a reasonable way to approach it.
    Yes, that makes sense a little bit. But then, for number 2 you said yes, which would imply that OpenCL does not finish kernel execution until the shared buffers are 100% synchronized between devices. (or does not start a new operation?). If that is the case, then there would be no performance advantage or penalty to use any device for reading the resulting data. Would you agree?

  8. #8
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: memory buffer question

    Quote Originally Posted by yurtesen
    Quote Originally Posted by notzed
    All but 4 are simple reads of the specification (and are yes, assuming you stick to the rules and do it properly).
    It is not very clear to me, or I couldnt find where this is clearly explained (dont say appendix because I have been there )
    Well I will tell you go to the appendix - in fact the very first page of the appendix (A.1) answers most of your questions. Have you really read it?

    also from the spec:
    1. section 5.2 for buffer creation flags, conventions.

    2. section 5.11, second paragraph about ordering of kernels and data movement.

    3. doesn't actually have enough info to tell me what you're doing: if you've used copy_host_ptr then whatever 'buffer' your referring to is explicitly un-referenced by opencl - again section 5.2.1; so a write is obviously required.

    The only way opencl can possibly use a buffer you have allocated is when you have USE_HOST_PTR set, and then the specification clearly states the driver might cache it elsewhere and so a write is required there too. Iif you search for USE_HOST_PTR throughout the spec is lists all the various conditions.

    Also,

    Quote Originally Posted by notzed
    For 3, if buffer3 is used by kernel1 then you're breaking the rules.
    No, it is only used by kernel2, but the question boils down to exactly when the buffer is copied to device, after creating the buffer, it might or not be copied to device right? So safest bet would be using an enqueuewrite if the data was changed after creation of buffer even though that buffer was not used yet?
    As above: unless you're using use_host_ptr then your buffer is nothing to opencl after the buffer creation call - and given the api has no 'wait' event, it implies copy_host_ptr is synchronous (i.e. immediate).

    And if you are using use_host_ptr, then you have to write anyway.

    It's no 'safest best', it's simply the defined api contract ...
    Quote Originally Posted by notzed
    For 4, try the archives, from memory it shouldn't matter for valid data results (assuming all the synchronisation is correct): but it may affect performance. Intuitively the last device to write to the data will be the best one to read it from ... and I would have an expectation that that would at least be a reasonable way to approach it.
    Yes, that makes sense a little bit. But then, for number 2 you said yes, which would imply that OpenCL does not finish kernel execution until the shared buffers are 100% synchronized between devices. (or does not start a new operation?). If that is the case, then there would be no performance advantage or penalty to use any device for reading the resulting data. Would you agree?
    I said nothing about kernel's finishing execution, your wording is wrong but I didn't feel the need to correct it. It simply doesn't start the operation until it has the data where the operation is going to run, and it doesn't copy the data until the kernel that writes to it has finished with it - how else could it possibly do it? Bend time?

    I don't work on drivers: i have no idea if there is a performance penalty, but that's not to say there isn't one. The api only guarantees the order of execution and implying further is just guessing.

Similar Threads

  1. buffer enqueue question
    By choos3n in forum OpenSL ES - General
    Replies: 1
    Last Post: 03-27-2012, 02:41 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •