How to measure GPU performance
I want to measure the performance of the GPU in NVIDIA's GeForce 9400 GT
The steps in the host code are:
In order to compare I did the same calucation on the host without GPU.
It seems that even when the kernel does nothing, the GPU works 5 times faster than the host.
This does not make sense. It should work much faster. The NVIDIA has 16 cores each running at 1.4GHz. The host is Core2Duo running at 3GHz.
What is wrong in my measurment ?
Hi, my understanding is that GPU and CPU differs in their startup overhead. But you might show more timing information ...
timing outside on openCL only makes sense when you have blocking calls. if you only want to see the time your kernel runs have a look at the timing events of the Kernel
This is the way to do it according "OpenCL in Action" by Matthew Scarpino:
cl_ulong time_start, time_end, total_time;
queue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE , &err);
err = clEnqueueNDRangeKernel(queue, kernel, dim, global_offset,
global_size, 0, 0 ,NULL, &prof_event);
sizeof(time_start), &time_start, NULL);
sizeof(time_end), &time_end, NULL);
total_time = time_end-time_start;
zvivered is correct, you can measure kernel execution using events. Putting timers around the enqueue calls only measures the enqueue speed (although I guess if your read is blocking you'll get something.)
You can also use NVIDIA Parallel Nsight or AMD APP Profiler to see timeline traces of memory transfers and kernel execution times, as well as summaries showing min/max/averages.