I use the command clGetEventProfilingInfo with CL_PROFILING_COMMAND_END and CL_PROFILING_COMMAND_START to get the time by the kernel on the gpu device or the memory transfer ?
But how can I measure the consumed time by the command clEnqueueNDRangeKernel on the CPU and GPU ?
Should I add the user time CPU (e.g. returned by getrusage) and the time returned by clGetEventProfilingInfo for the event associated with clEnqueueNDRangeKernel?
You can use clGetEventProfilingInfo with CL_PROFILING_COMMAND_QUEUED and CL_PROFILING_COMMAND_SUBMIT to find out delta between when command was enqueued by the application and when the command was submitted to the device. Similarly, the delta between CL_PROFILING_COMMAND_SUBMIT and CL_PROFILING_COMMAND_START will tell you the time delta between when the command was submitted to the device and when it actually started executing on the device.