Results 1 to 6 of 6

Thread: Non-blocking write buffer problem with multiple contexts

  1. #1
    Junior Member
    Join Date
    Aug 2011
    Posts
    20

    Non-blocking write buffer problem with multiple contexts

    I encounter the problem with non-blocking clEnqueueWriteBuffer when I use multiple contexts concurrently. Within a program I run, there is one in-order-execution cl_command_queue and one cl_context. In each program, there is at least one gpu task, and tasks can run concurrently. Note that tasks within one program use the same command queue. I run multiple programs at the same time, and sometimes some program generate wrong outputs.

    The following code is one gpu task:
    Code :
    cl_mem _clmem1 = clCreateBuffer(context, CL_MEM_READ_WRITE, n * sizeof(float), NULL, &err);
    cl_mem _clmem2 = clCreateBuffer(context, CL_MEM_READ_WRITE, n * sizeof(float), NULL, &err);
    clEnqueueWriteBuffer(queue, _clmem1, CL_FALSE, 0,n * sizeof(float), input, 0, NULL, NULL); //non-blocking write
    clSetKernelArg(clkern, 0, sizeof(cl_mem), &_clmem1);
    clSetKernelArg(clkern, 1, sizeof(cl_mem), &_clmem2);
    size_t workdim[] = {N};
    clEnqueueNDRangeKernel(queue, clkern, 1, 0, workdim, NULL, 0, NULL, NULL );
    clEnqueueReadBuffer(queue, _clmem, CL_FALSE, 0, n * sizeof(float), output, 0, NULL, &eventout); //non-blocking read
    {
    clGetEventInfo(eventout, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &ret, NULL);
    }while(ret != CL_COMPLETE);
    print output
    clReleaseMemObject(_clmem1);
    clReleaseMemObject(_clmem2);

    I run some experiments to find out what's wrong. When these 2 conditions apply, the programs sometimes don't do the right things:
    1) non-blocking write buffer
    2) multiple contexts (or command queues)
    When only one program runs at a time (one context), it always does the right thing. When multiple programs run concurrently but with blocking write, they always do the right thing as well.

    To narrow down the problem a bit more, I call the blocking read for "_clmem1" before call clEnqueueNDRangeKernel to see where the data get changed. I find out that the data is different from "input" before the kernel is run, and the "input" which resides on the host memory is till the same. Therefore, there is something wrong with clEnqueueWriteBuffer.

    I test more to see weather it's really because of the multiple contexts, so now I run only one program that has one context but multiple command queues. The result is it also sometimes generate a wrong output.

    I'm using OpenCL 1.0 in CUDA 3.2 on NVIDIA driver 260.19.36. My machine is Linux x86_64.

    Thank you so much for reading this long description of the problem I encounter. I'm really appreciated your attempt to help. I'll be super happy if anyone knows what's going on and gives me suggestions of how to make the programs work properly. It's very crucial for me to make this work.

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Non-blocking write buffer problem with multiple contexts

    First, the term "task" has a different meaning in OpenCL. What you describe are multiple commands, such as clEnqueueWriteBuffer, clEnqueueNDRangeKernel, etc.

    Second, there is no synchronization between different contexts. If you use multiple contexts, commands will be executed in any order.

    Also, there is no synchronization between different command queues even if they are in the same context. This is a simplification but it's good enough for today.

    I strongly recommend calling clFinish() before calling before calling clGetEventInfo(). That way you don't need to call clGetEventInfo() inside a loop.

    Finally, is it possible that you are freeing variable "input" right after calling clEnqueueWriteBuffer()? When do you free that variable?
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Aug 2011
    Posts
    20

    Re: Non-blocking write buffer problem with multiple contexts

    Thank you so much for responding.

    Quote Originally Posted by david.garcia
    First, the term "task" has a different meaning in OpenCL. What you describe are multiple commands, such as clEnqueueWriteBuffer, clEnqueueNDRangeKernel, etc.
    Thanks for clarification, and sorry for misusing the term.

    Quote Originally Posted by david.garcia
    Second, there is no synchronization between different contexts. If you use multiple contexts, commands will be executed in any order.
    Yes, I do know that. However, each program has its own context, and the programs are independent, so I don't need any kind of synchronization between them.

    Quote Originally Posted by david.garcia
    Also, there is no synchronization between different command queues even if they are in the same context. This is a simplification but it's good enough for today.
    Same thing, when I use different command queues on the same context, they are completely independent, or else I explicitly pass the events as event_wait_list in the function.

    Quote Originally Posted by david.garcia
    I strongly recommend calling clFinish() before calling before calling clGetEventInfo(). That way you don't need to call clGetEventInfo() inside a loop.
    The above code that I showed is only parts of my program. The real code is extremely complicated. Anyhow, the reason I use clGetEventInfo is that when the event is not complete, the cpu thread can go do something else while waiting. If I use clFinish(), then it will behave like blocking read which defeats the purpose of what I'm trying to do.

    Quote Originally Posted by david.garcia
    Finally, is it possible that you are freeing variable "input" right after calling clEnqueueWriteBuffer()? When do you free that variable?
    I'm pretty sure I don't because after the everything finishes (after clEnqueueReadBuffer is complete), I print the input, and it still have the same value.

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Non-blocking write buffer problem with multiple contexts

    However, each program has its own context, and the programs are independent, so I don't need any kind of synchronization between them.
    Are we talking about different applications here? That is, different processes? When you say "program" it makes me think of OpenCL program object.

    If there's an issue where running one process works fine but running multiple processes causes one of them to fail, that sound like a bug in the driver. You may want to contact your hardware vendor. They will need a short application that reproduces the issue.

    Anyhow, the reason I use clGetEventInfo is that when the event is not complete, the cpu thread can go do something else while waiting.
    That's cool. You will still need a call to clFlush() because otherwise there's no guarantee that the event will ever complete. Your code could enter an infinite loop.

    The above code that I showed is only parts of my program. The real code is extremely complicated.
    Can you reproduce the issue even with the simplified code you showed us? The problem could be somewhere else.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5
    Junior Member
    Join Date
    Aug 2011
    Posts
    20

    Re: Non-blocking write buffer problem with multiple contexts

    Quote Originally Posted by david.garcia
    Are we talking about different applications here? That is, different processes? When you say "program" it makes me think of OpenCL program object.

    If there's an issue where running one process works fine but running multiple processes causes one of them to fail, that sound like a bug in the driver. You may want to contact your hardware vendor. They will need a short application that reproduces the issue.
    Yeah, sorry I didn't make it clear. By program, I mean application. One application has one context and one command queue (but it is multi-threading), and only one thread (gpu manager thread) invokes calls to gpu. When I run one application, it works find, but when I run more than one, some of them generate wrong outputs.

    Quote Originally Posted by david.garcia
    That's cool. You will still need a call to clFlush() because otherwise there's no guarantee that the event will ever complete. Your code could enter an infinite loop.
    I see. I'll do that.

    Quote Originally Posted by david.garcia
    Can you reproduce the issue even with the simplified code you showed us? The problem could be somewhere else.
    I haven't tried, but I should. That's a very good suggestion. I'll let you know what the outcome is. Thanks a lot.

  6. #6
    Junior Member
    Join Date
    Aug 2011
    Posts
    20

    Re: Non-blocking write buffer problem with multiple contexts

    It's fixed. The bug is on my side. Nothing to do with gpu. Thank you so much for helping anyway.

Similar Threads

  1. multiple contexts and devices
    By brdavs in forum OpenCL
    Replies: 1
    Last Post: 01-24-2013, 02:02 PM
  2. Replies: 12
    Last Post: 04-04-2011, 03:30 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •