Results 1 to 4 of 4

Thread: OpenCL program freezes when high number of kernels are launched within a loop

Hybrid View

  1. #1
    Newbie
    Join Date
    Oct 2013
    Posts
    3

    OpenCL program freezes when high number of kernels are launched within a loop

    Hi,

    I have a loop (about 1 billion iterations) that launches OpenCL kernels. Each kernel is executed by 1 thread, and performs a very trivial operation. The problem is that after the execution of few millions iterations the code freezes (stops) and the program does not terminate at all. It freezes in the call to clFinish(). The program does not always freeze in the same iteration.

    The problem disappears if clFinish() is called once every 1000 iterations instead of being called in every iteration, so I have the feeling like the problem is that clFinish() is waiting for the end of the kernel but the kernl is killed (somehow) before clFinish() is called. Note also that when I insert many printf() calls inside the loop the problem disappears!

    I get the problem when I execute the program on CPU device (on my laptop, I use AMD SDK), and I get the problem also on a machine with Nvidia Fermi GPU (Nvidia SDK and drivers, AMD SDK is also installed on that machine).

    I'm checking for errors after each OpenCL API call but no error is detected.

    My questions:
    - Is their any incorrect use of the OpenCL API below ?
    - Is their any problem if a huge number of OpenCL kernels are launched simultaneously ?

    Host code:
    Code :
       /* OpenCL initialization.  */
       /* ... */
        cl_mem dev_acc = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(double), NULL, &err);
     
        for (int h0 = 1; h0 <= ni; h0 += 1)
          for (int h2 = 0; h2 < nj; h2 += 1)
            for (int h5 = 0; h5 < h2 - 1; h5 += 1) {
    	      size_t global_work_size[1] = {1};
    	      size_t block_size[1] = {1};
    	      cl_kernel kernel2 = clCreateKernel(program, "kernel2", &err);
    	      clSetKernelArg(kernel2, 0, sizeof(cl_mem), (void *) &dev_acc);
                  clEnqueueNDRangeKernel(queue, kernel2, 1, NULL, global_work_size,block_size,0, NULL, NULL);
                  clFinish(queue);
    	      clReleaseKernel(kernel2);
               }

    Kernel code:
    Code :
    __kernel void kernel2(__global double *acc)
    {
          *acc = 1;
    }

    Compilation:
    gcc -O3 -lm -std=gnu99 polybench.c ocl_utilities.c symm_host.c -lOpenCL -lm -I/opt/AMDAPP/include -L/opt/AMDAPP/lib/x86_64

    I'm using Ubuntu 12.04, Kernel 3.2.0-29-generic, X86_64, RAM: 2 GB

  2. #2
    Newbie
    Join Date
    Oct 2013
    Posts
    3
    Any comment about this problem ?

  3. #3
    Junior Member
    Join Date
    Oct 2011
    Location
    Seattle, WA
    Posts
    27
    I don't see any errors in your approach. So I'm wondering if there is an error in the library. Did you figure it out?
    I'm finding your post because of my interest in calling clSetKernelArg many times.

    In your example case, you can move the kernel object and argument code outside of the for loops, right? I wonder if this would reduce resource pressure within the library and run error-free?

    Code :
    /* OpenCL initialization.  */
       /* ... */
        cl_mem dev_acc = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(double), NULL, &err);
        cl_kernel kernel2 = clCreateKernel(program, "kernel2", &err);
        clSetKernelArg(kernel2, 0, sizeof(cl_mem), (void *) &dev_acc);
     
        for (int h0 = 1; h0 <= ni; h0 += 1)
          for (int h2 = 0; h2 < nj; h2 += 1)
            for (int h5 = 0; h5 < h2 - 1; h5 += 1) {
    	      size_t global_work_size[1] = {1};
    	      size_t block_size[1] = {1};
                  clEnqueueNDRangeKernel(queue, kernel2, 1, NULL, global_work_size,block_size,0, NULL, NULL);
                  clFinish(queue);
               }
        clReleaseKernel(kernel2);

  4. #4
    Newbie
    Join Date
    Oct 2013
    Posts
    3
    I noticed that the problem appears only with the Nvidia OpenCL library, the same code works fine with the AMD OpenCl library. This makes me think that the problem is a library problem, but I don't have any proof.

    Did you experience a similar problem ?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •