Results 1 to 3 of 3

Thread: Deadlock occurs with N threads, N command queues & N Kernels

  1. #1
    Newbie
    Join Date
    May 2013
    Posts
    2

    Deadlock occurs with N threads, N command queues & N Kernels

    Hi there,

    I have a rather simple case of one program with one kernel that is being called by many threads. At each call site, the thread acquires a command queue and kernel object that are not used by any other threads. It then allocates a read only and write only buffers. This is shown in the code below. After a while, the application will lock up and all threads and stopped at wait on single object (Windows 7). It looks like most to all threads are inside of clEnqueueWriteBuffer or clReleaseMemObject. The latter surprises me as I would expect clReleaseMemObject to really not do much. Is there anything wrong the pattern that I am using? I am using AMD OpenCL.

    Thx,
    S.

    Code :
    		program = getProgram( programId ); // return already create program
    		kernel = aquireKernel( program, kernelID ); // only one thread can use this kernel object at any one time
     
    		entryCount = width * height;
    		inputSize = entryCount * channelCount * sizeof( inType );
    		outputSize = entryCount * channelCount * sizeof( outType );
     
    		commandQueue = aquireCommandQueue(); // only one thread can use this command queue at any one time
     
    		err = CL_SUCCESS;
     
    		inMem = clCreateBuffer( openclContext, CL_MEM_READ_ONLY, inputSize, NULL, &err );
    		err |= clEnqueueWriteBuffer( commandQueue, inMem, CL_TRUE, 0, inputSize, in, 0, NULL, NULL ); // often locks up
    		assert( err == CL_SUCCESS );
     
    		outMem = clCreateBuffer( openclContext, CL_MEM_WRITE_ONLY, outputSize, NULL, &err );
     
    		err |= clSetKernelArg( kernel,  0, sizeof(cl_mem), &inMem );
    		err |= clSetKernelArg( kernel,  1, sizeof(cl_mem), &outMem );
    		err |= clEnqueueNDRangeKernel( commandQueue, kernel, 1, NULL, &entryCount, NULL, 0, NULL, NULL );
     
    		// actual enqueue call == render
    		err |= clFinish( commandQueue );
     
    		// read buffer back
    		err |= clEnqueueReadBuffer( commandQueue, outMem, CL_TRUE, 0, outputSize, out, 0, NULL, NULL );
    		assert( err == CL_SUCCESS );
     
    		err |= clReleaseMemObject( outMem ); // often locks up
    		err |= clReleaseMemObject( inMem );
     
    		releaseCommandQueue( commandQueue );
     
    		releaseKernel( kernelID, kernel );

  2. #2
    Newbie
    Join Date
    May 2013
    Posts
    2
    I have tried different things, some of which seem to help:

    1. Use thread local storage so that there is 1 to 1 mapping between thread and command queue. Buffers are still allocated & deallocated after running the kernels
      • Did not seem to make a difference
    2. Use thread local storage so that there is 1 to 1 mapping between thread and command queue but buffers are re-used instead of being deallocated
      • Runs a lot longer but eventually crashes trying to do a copy deep inside open amdocl64.dll. I was not able to track the source of the memory.
    3. Use thread local storage and protect entire code around a critical section
      • That seems to work but that completely destroys the parallelism. Performance becomes horrendous.


    At this point, I think it will be better to have a single thread do the scheduling and use events to figure out when the work is done.

    Thoughts?

  3. #3
    Junior Member
    Join Date
    Dec 2011
    Posts
    25
    This might help you with some info as to how threadsafe the OpenCL APIs are: http://www.khronos.org/message_board...eue-and-device .

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •