Hello all! I am trying to write a FAST (Corner detection Algorithm) function in opencl, but I am finding that just copying the memory to the OpenCl buffer and running an empty kernel is taking 1-2 milliseconds, I feel like I am doing something wrong (Im pretty new to OpenCL) but I'm just stumped, I was hoping someone could give me some direction or pointers.
Code :clEnqueueWriteBuffer(commands, input, CL_FALSE, 0, DATA_SIZE, Image->data(), 0, NULL, NULL); clEnqueueWriteBuffer(commands, outputSize, CL_FALSE, 0, sizeof(int), numResults, 0, NULL, NULL); //Stride of image Data clSetKernelArg(kernel, 3, sizeof(unsigned int), &Stride); clSetKernelArg(kernel, 5, sizeof(unsigned char), & Threshhold); clSetKernelArg(kernel, 6, sizeof(int), &Height); ErrorCheck(err, "Error: Failed to set kernel arguments! "); clFinish(commands);
This particular piece of code is taking .5-4 milliseconds (usually closer to 1) with the exact same sized data every time (a byte array of a 1280X720 Image), which is troubling because the single thread cpu function to process it takes 1 millisecond to do the whole fast algorithm. Am I just not going to be able to match the speed of the CPU processing it? Or am I just passing data around wrong? Id be glad to post any other pieces of code that may be relevant I just didn't want to flood the thread with my whole function XD