Results 1 to 5 of 5

Thread: How to measure GPU performance

  1. #1
    Junior Member
    Join Date
    Oct 2013
    Posts
    22

    How to measure GPU performance

    Hello,

    I want to measure the performance of the GPU in NVIDIA's GeForce 9400 GT

    The steps in the host code are:

    clSetKernelArg

    clCreateCommandQueue

    *start measure
    clEnqueueNDRangeKernel

    clEnqueueReadBuffer
    *end measure


    In order to compare I did the same calucation on the host without GPU.
    It seems that even when the kernel does nothing, the GPU works 5 times faster than the host.

    This does not make sense. It should work much faster. The NVIDIA has 16 cores each running at 1.4GHz. The host is Core2Duo running at 3GHz.

    What is wrong in my measurment ?

    Thanks,
    Zvika

  2. #2
    Newbie
    Join Date
    Nov 2013
    Posts
    2
    Hi, my understanding is that GPU and CPU differs in their startup overhead. But you might show more timing information ...
    Jianbin

  3. #3
    Senior Member
    Join Date
    Oct 2012
    Posts
    165
    timing outside on openCL only makes sense when you have blocking calls. if you only want to see the time your kernel runs have a look at the timing events of the Kernel

  4. #4
    Junior Member
    Join Date
    Oct 2013
    Posts
    22
    This is the way to do it according "OpenCL in Action" by Matthew Scarpino:

    cl_event prof_event;
    cl_ulong time_start, time_end, total_time;
    ....
    queue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE , &err);
    ....
    err = clEnqueueNDRangeKernel(queue, kernel, dim, global_offset,
    global_size, 0, 0 ,NULL, &prof_event);
    clFinish(queue);
    clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_START,
    sizeof(time_start), &time_start, NULL);
    clGetEventProfilingInfo(prof_event, CL_PROFILING_COMMAND_END,
    sizeof(time_end), &time_end, NULL);

    total_time = time_end-time_start;

  5. #5
    Senior Member
    Join Date
    Dec 2011
    Posts
    163
    zvivered is correct, you can measure kernel execution using events. Putting timers around the enqueue calls only measures the enqueue speed (although I guess if your read is blocking you'll get something.)

    You can also use NVIDIA Parallel Nsight or AMD APP Profiler to see timeline traces of memory transfers and kernel execution times, as well as summaries showing min/max/averages.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •