Page 2 of 2 FirstFirst 12
Results 11 to 15 of 15

Thread: Time of clReleaseMemObject : strange behaviour

  1. #11
    Junior Member
    Join Date
    Aug 2009
    Posts
    21

    Re: Time of clReleaseMemObject : strange behaviour

    Quote Originally Posted by jonathan81
    Hi,
    In fact you will see the same problem with the vectorAdd example in the SDK
    If you look the profiler you will see so many memcpyDtoH during the release of the memory
    Moreover, with big data for the vector the time of releasing the command Queue is Huge!!!
    Kernel Time : 0.18s
    Release Command Queue : 0.6s
    i think it's a bug with the nvidia driver
    Thx
    J
    Can you post the source (host + client) for testing the times on my OS X / GT 9600 ?

  2. #12
    Junior Member
    Join Date
    Jul 2009
    Posts
    11

    Re: Time of clReleaseMemObject : strange behaviour

    It's just the vectorAdd example without CPU computation and with a size of data = 512 * 100 000
    And just look the time of the function : clReleaseCommandQueue
    Thx
    J

  3. #13

    Re: Time of clReleaseMemObject : strange behaviour

    Appologies if I've missed something, but isn't this just that the memory object can't be released until the asynchronous computation is complete. As soon as you add the kernel to the mix the work goes from:

    • Define memory objects A & B[/*:m:f831f2zj]
    • release unused objects.[/*:m:f831f2zj]


    which will probably be optimised to nothing, to:

    • Define memory objects A & B[/*:m:f831f2zj]
    • Upload them to the card[/*:m:f831f2zj]
    • Execute kernel[/*:m:f831f2zj]
    • Wait for kernel to finish and then release host and gpu copies of memory objects.[/*:m:f831f2zj]


    I.e. it's not the release that's taking the time, but the upload + kernel + release. It's just that the time waiting is spent in the clRelease.

  4. #14
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Time of clReleaseMemObject : strange behaviour

    That certainly could be the issue.
    In general to time CL you need to do:

    start = gettime()

    loop(100) {
    do work
    }
    clFinish();

    end = gettime();
    time = (end-start)/100;

    If you don't call finish then stuff could still be executing. In particular, if you have:

    clEnqueue()
    clRead()
    clRelease()

    it's possible the release will wait for the enqueue and read to finish. However, I'd still consider this a performance bug since the release should just decrement the internal retain count on the memory object and the runtime should have incremented it if it needs to keep it around for the read. So the actual release might not happen until later, but I'd expect clRelease() to execute really quickly all the time.

    There is a subtle issue where if you release a memory object allocated with a host pointer you don't know when the runtime is done with the object so you can't free your host pointer very reliably. Apple has an extension "clSetMemObjectDestructorAPPLE" that allows you to get a callback when it is safe to do the free. I'd expect something like this to make it into the standard in the future.

  5. #15
    Junior Member
    Join Date
    Jul 2009
    Posts
    11

    Re: Time of clReleaseMemObject : strange behaviour

    @PaulS:
    I use a clFinish in my case. Just launch a profiler on the vectorAdd example and you will see that when you release your data or your commandQueue you have several memcpyDtoH which is unnecessary for releasing the memory because i don't use the option CL_MEM_USE_HOST_POINTER...
    It's a bug in the nvidia driver.
    Thx
    J

Page 2 of 2 FirstFirst 12

Similar Threads

  1. Replies: 2
    Last Post: 10-14-2012, 08:41 AM
  2. clCreateSubBuffer and clreleaseMemObject
    By nachovall in forum OpenCL
    Replies: 4
    Last Post: 11-22-2011, 06:24 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •