Results 1 to 2 of 2

Thread: Performance Questions with regards to image processing.

  1. #1
    Newbie
    Join Date
    Jul 2013
    Posts
    2

    Performance Questions with regards to image processing.

    Hello all! I am trying to write a FAST (Corner detection Algorithm) function in opencl, but I am finding that just copying the memory to the OpenCl buffer and running an empty kernel is taking 1-2 milliseconds, I feel like I am doing something wrong (Im pretty new to OpenCL) but I'm just stumped, I was hoping someone could give me some direction or pointers.

    Code :
        clEnqueueWriteBuffer(commands, input, 
                                CL_FALSE, 0, DATA_SIZE, 
                                Image->data(), 0, NULL, NULL);
     
        clEnqueueWriteBuffer(commands, outputSize, 
                                CL_FALSE, 0, sizeof(int), 
                                numResults, 0, NULL, NULL);
     
        //Stride of image Data
        clSetKernelArg(kernel, 3, sizeof(unsigned int), &Stride);
        clSetKernelArg(kernel, 5, sizeof(unsigned char), & Threshhold);
        clSetKernelArg(kernel, 6, sizeof(int), &Height);
        ErrorCheck(err, "Error: Failed to set kernel arguments! ");
     
     
        clFinish(commands);

    This particular piece of code is taking .5-4 milliseconds (usually closer to 1) with the exact same sized data every time (a byte array of a 1280X720 Image), which is troubling because the single thread cpu function to process it takes 1 millisecond to do the whole fast algorithm. Am I just not going to be able to match the speed of the CPU processing it? Or am I just passing data around wrong? Id be glad to post any other pieces of code that may be relevant I just didn't want to flood the thread with my whole function XD
    Last edited by comwizz2; 07-06-2013 at 09:55 AM.

  2. #2
    Senior Member
    Join Date
    Oct 2012
    Posts
    165
    To compare execution time you have to mesure the kernel time only using the event system. if you compare the whole process on cpu and gpu, cpu implementation might be faster for a small problemsize because the pci-e communication to the gpu takes some time too. one way to varify that your gpu works fast is to increase the problem size.
    Don't try to compare 100x100 px images on cpu and gpu. there is to much constant overhead for the gpu to win that race

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •