Results 1 to 4 of 4

Thread: OpenCL CPU issue: execution stop too early

  1. #1
    Junior Member
    Join Date
    Apr 2012
    Posts
    9

    OpenCL CPU issue: execution stop too early

    Hello,

    I am a beginner with OpenCL, and I am trying to run a basic kernel, in the aim of processing image analysing function on GPU. For now I just want to try to execute the following kernel, on CPU Intel Sandy Bridge. I am working on visual studio 2010.

    The problem is that my kernel is not executed on the whole data. However, trying to debug, I understood that leaving a printf into the kernel make it working perfectly x). I assume it is because it slows down the execution...

    Here is the kernel:

    Code :
    __kernel void ker_FLADIntra_sum_c(
    const int stride, 
    __global int* pix,
    __global int* p_flad_sum)
    {
    int i, j;
    j = get_global_id(0);
    i= get_global_id(1);
     
    /*printf which make it working*/
    //printf("j=%d\ti=%d\tsum=%d\n", j, i, *p_flad_sum);
     
     *p_flad_sum+= abs(pix[j*stride + i] - pix[(j+1)*stride + i]);
    }

    The host program is mainly based on OpenCL SDK example, and Intel tutorial. Here is how I manage memory and run the kernel:

    Code :
        //create OpenCL buffer using input array memory
        g_inputBuffer = clCreateBuffer(g_context, CL_MEM_READ_ONLY, sizeof(cl_int) * arraySize, NULL, NULL);
    	g_flad_sum = clCreateBuffer(g_context, CL_MEM_READ_WRITE, sizeof(cl_int), NULL, NULL);
     
        if (g_inputBuffer == (cl_mem)0)
        {
            printf("ERROR: Failed to create input data Buffer\n");
            return false;
        }
     
    	err = clEnqueueWriteBuffer( g_cmd_queue, g_inputBuffer, CL_TRUE, 0, sizeof(cl_int) * arraySize, inputArray, 0, NULL, NULL );
    	err = clEnqueueWriteBuffer( g_cmd_queue, g_flad_sum, CL_TRUE, 0, sizeof(cl_int), p_flad_sum, 0, NULL, NULL );
     
    	err  = clSetKernelArg(g_kernel, 0, sizeof(cl_int), (void *) &stride);
        err |= clSetKernelArg(g_kernel, 1, sizeof(cl_mem), (void *) &g_inputBuffer);
        err |= clSetKernelArg(g_kernel, 2, sizeof(cl_mem), (void *) &g_flad_sum);
        if (err != CL_SUCCESS)
        {
            printf("ERROR: Failed to set input kernel arguments\n");
            return false;
        }
     
     
                // set work-item dimensions
    			size_t global_work_size[2];
                 global_work_size[0] = (size_t) stride-1;	//number of quad items in input array
    			 global_work_size[1] = (size_t) stride;	//skip the last row not to go out of allocation
    			 nd=2; // execute kernel (2D)
     
             if (CL_SUCCESS != clEnqueueNDRangeKernel(g_cmd_queue, g_kernel, nd, NULL, global_work_size, NULL, 0, NULL, NULL))
             {
                    printf("ERROR: Failed to execute sorting kernel\n");
                    return false;
             }
     
    	err = clFinish(g_cmd_queue);
     
    	err = clEnqueueReadBuffer( g_cmd_queue, g_flad_sum, CL_TRUE, 0, sizeof(cl_int), p_flad_sum, 0, NULL, NULL );

    I also ran VTune to check the timeline threads behaviour with and without this printf:

    Without printf


    With
    http://img109.imageshack.us/img109/8...fladvtune2.png

    I first thought about the blocking or non-blocking memory read, but it is already on CL_TRUE.
    What am I missing?

    Thanks.

  2. #2
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: OpenCL CPU issue: execution stop too early

    Quote Originally Posted by chritolb29
    Hello,

    *p_flad_sum+= abs(pix[j*stride + i] - pix[(j+1)*stride + i]);
    This wont work, you're trying to add to the same address concurrently: this leads to a race condition when you're using multiple threads. Do a search on 'parallel reduction' or look at sdk examples of similar functions to see how it needs to be done. If you don't understand 'race condition', also search on it; these are both very widely known issues.

    Using printf probably just serialises the threads or something and just makes it appear to work.

  3. #3
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: OpenCL CPU issue: execution stop too early

    Quote Originally Posted by chritolb29
    *p_flad_sum+= abs(pix[j*stride + i] - pix[(j+1)*stride + i]);
    This wont work, you're trying to add to the same address concurrently: this leads to a race condition when you're using multiple threads. Do a search on 'parallel reduction' or look at sdk examples of similar functions to see how it needs to be done. If you don't understand 'race condition', also search on it; these are both very widely known issues.

    Using printf probably just serialises the threads or something and just makes it appear to be ok.

  4. #4
    Junior Member
    Join Date
    Apr 2012
    Posts
    9

    Re: OpenCL CPU issue: execution stop too early

    Thank you for your answer.
    Parallel reduction is a big issue indeed, I re-wrote my kernel, taking it into account and it works well now.

Similar Threads

  1. Replies: 3
    Last Post: 03-24-2012, 05:35 AM
  2. The execution in GPU is not the same that CPU
    By luizdrumond in forum OpenCL
    Replies: 3
    Last Post: 09-02-2011, 03:00 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •