Results 1 to 8 of 8

Thread: How do opencl kernel perform?

  1. #1

    How do opencl kernel perform?

    I don't understand how do the opencl kernel function to perform.
    I want to calculate something in opencl kernel. So I use 256 work-item and perform it and I want to gain final result.

    Is clEnqueueDNRangeKernel() function perform all of the 256 work-item?
    Or am I using for loop?? ( If I use the for loop, I also write 16*16 number repeat?)

  2. #2

    Re: How do opencl kernel perform?

    If you configure your call to launch 256 work items, using the localSize and globalSize arguments, then a single clEnqueueNDRangeKernel() call will launch all those 256 threads. No need for a loop.

  3. #3

    Re: How do opencl kernel perform?

    Quote Originally Posted by ibbles
    If you configure your call to launch 256 work items, using the localSize and globalSize arguments, then a single clEnqueueNDRangeKernel() call will launch all those 256 threads. No need for a loop.
    Thanks your advise.

    I have one question. If I want to 256 work items, What's the number of localSize and globalSize?
    Now in my source, globalSize is 16*16(=256) and localSize is 16.
    Is it right?? or wrong??

  4. #4

    Re: How do opencl kernel perform?

    That is correct. The global size is the total number of threads you want, 256 in this case. Setting localSize to 16 will split these 256 threads to 256/16 = 16 groups.

    Note that the both of these sizes may be several values, one for each dimension of the NDRange. So setting the global size to 256 and work_dim to 1 gives you a consecutive range of thread ids from 0 up to, but not including, 256. Your question included a 16*16, which hints at a two-dimensional problem. If that is the case, then you may set the global size to [16,16] and work_dim to 2, which will spawn 256 threads in a 16-by-16 grid.

  5. #5

    Re: How do opencl kernel perform?

    Quote Originally Posted by ibbles
    That is correct. The global size is the total number of threads you want, 256 in this case. Setting localSize to 16 will split these 256 threads to 256/16 = 16 groups.

    Note that the both of these sizes may be several values, one for each dimension of the NDRange. So setting the global size to 256 and work_dim to 1 gives you a consecutive range of thread ids from 0 up to, but not including, 256. Your question included a 16*16, which hints at a two-dimensional problem. If that is the case, then you may set the global size to [16,16] and work_dim to 2, which will spawn 256 threads in a 16-by-16 grid.
    I set the global size to [16,16] and work_dim to 2.
    However the problem is occured.
    This problem is " CL_INVALID_WORK_GROUP_SIZE ".

    I changed the source code.

    1. first change
    Code :
    szGlobalWorkSize[0] = iWidth;     // iWidth = 16
    	szGlobalWorkSize[1] = iHeight; // iheight = 16
    	szLocalWorkSize[0]= NUM_THREADS;
     
    err = clEnqueueNDRangeKernel(oclHandles.queue, 
    			m_clHexEncodeKernel, 
    			2,	// work_dim value
    			NULL,
    			szGlobalWorkSize, 
    			szLocalWorkSize, 
    			0, NULL, NULL);

    2. second change
    Code :
       szLocalWorkSize[0] = iWidth;
    	szLocalWorkSize[1] = iHeight;
    	szGlobalWorkSize[0] = shrRoundUp((int)szLocalWorkSize[0], iWidth);

    What is wrong???

  6. #6

    Re: How do opencl kernel perform?

    In both cases, you only set one value in one of the size. In 1), szLocalWorkSize[1] is still undefined, and in 2) the same is true for szGlobalWorkSize.

    It would probably help if you added prints right before clEnqueue... that did something like

    Code :
    printf("Local: %d, %d, %d\n", <fill in here>);
    printf("Global: %d, %d, %d\n", <fill in here>);
    printf("Dim: %d", word_dim);

  7. #7

    Re: How do opencl kernel perform?

    Quote Originally Posted by ibbles
    In both cases, you only set one value in one of the size. In 1), szLocalWorkSize[1] is still undefined, and in 2) the same is true for szGlobalWorkSize.

    It would probably help if you added prints right before clEnqueue... that did something like

    Code :
    printf("Local: %d, %d, %d\n", <fill in here>);
    printf("Global: %d, %d, %d\n", <fill in here>);
    printf("Dim: %d", word_dim);
    OK. I wrote the szLocalWorkSize[1] and catched the problem.

    I have another question.
    Is OpenCL kernel funtion possible to executed sequentially??

  8. #8

    Re: How do opencl kernel perform?

    Typically no. The main idea behind OpenCL is data parallelism, which kind of implies parallel threads. You could of course launch only one thread at the time, but that would be horrible inefficient.

    Different kernels (or the same kernel multiple times) can be run sequentially , if that was what you were asking.

Similar Threads

  1. How to perform atomic sums on floats
    By jeffheaton in forum OpenCL
    Replies: 5
    Last Post: 12-19-2011, 12:13 PM
  2. Replies: 1
    Last Post: 03-14-2010, 07:34 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •