Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: different time of cpu and gpu

  1. #1

    different time of cpu and gpu

    i am using gpu of type nvidia,i am using opencl but when i run the program using ctrl+F5(start without debugging )then i get result in which gpu takes more time than cpu but when i run the program cpu takes more time than gpu and result is also i am giving
    start without debugging -> cpu time=6127 ms gpu time= 6240 ms
    start with debug-> cpu time= 18354 ms gpu time= 9125 ms

    wt is the reason in this difference......
    visual studio 2010 i am using
    the code is here. wt is going wrong.?..thanks


    // Hello.cpp : Defines the entry point for the console application.
    //

    //#include <stdafx.h>
    #include<stdio.h>
    #include<stdlib.h>
    #include<conio.h>
    #include<time.h>
    #include "CL/cl.h"
    #define DATA_SIZE 100000
    const char *KernelSource =
    "kernel void hello(global float *input , global float *output)\n"\
    "{\n"\
    " size_t id =get_global_id(0);\n"\
    "output[id] =input[id]*input[id];\n"\
    "} "
    "\n"\
    "\n";
    //float start_time,end_time;

    int main(void)
    {
    double start_time,end_time;
    start_time=clock();
    cl_context context;
    cl_context_properties properties[3];
    cl_kernel kernel;
    cl_command_queue command_queue;
    cl_program program;
    cl_int err;
    cl_uint num_of_platforms=0;
    cl_platform_id platform_id;
    cl_device_id device_id;
    cl_uint num_of_devices=0;
    cl_mem input,output;
    size_t global;
    float inputData[100000];
    for(int j=0;j<100000;j++)
    {
    inputData[j]=(float)j;
    }

    float results[DATA_SIZE];//={0};

    // int i;

    //retrieve a list of platform variable
    if(clGetPlatformIDs(1,&platform_id,&num_of_platfor ms)!=CL_SUCCESS)
    {
    printf("Unable to get platform_id\n");
    return 1;
    }

    //try to get supported GPU DEvice
    if(clGetDeviceIDs(platform_id,CL_DEVICE_TYPE_CPU,1 ,&device_id,
    &num_of_devices)!=CL_SUCCESS)
    {
    printf("unable to get device_id\n");
    return 1;
    }

    //context properties list -must be terminated with 0
    properties[0]=CL_CONTEXT_PLATFORM;
    properties[1]=(cl_context_properties) platform_id;
    properties[2]=0;

    //create a context with the GPU device
    context=clCreateContext(properties,1,&device_id,NU LL,NULL,&err);

    //create command queue using the context and device
    command_queue=clCreateCommandQueue(context,device_ id,0,&err);

    //create a program from the kernel source code
    program=clCreateProgramWithSource(context,1,(const char**)
    &KernelSource,NULL,&err);

    //compile the program
    err=clBuildProgram(program,0,NULL,NULL,NULL,NULL);
    if((err!=CL_SUCCESS))
    {
    printf("build error \n",err);
    size_t len;
    char buffer[4096];
    //get the build log
    clGetProgramBuildInfo(program,device_id,CL_PROGRAM _BUILD_LOG,sizeof(buffer),buffer,&len);
    printf("----build Log---\n%s\n",buffer);
    exit(1);

    // return 1;
    }

    //specify which kernel from the program to execute
    kernel=clCreateKernel(program,"hello",&err);

    //create buffers for the input and output
    input=clCreateBuffer(context,CL_MEM_READ_ONLY,size of(float)*DATA_SIZE,NULL,NULL);

    output=clCreateBuffer(context,CL_MEM_WRITE_ONLY,si zeof(float)*DATA_SIZE,NULL,NULL);

    //load data into the input buffer

    clEnqueueWriteBuffer(command_queue,input,CL_TRUE,0 ,
    sizeof(float)*DATA_SIZE,inputData,0,NULL,NULL);

    //set the argument list for the kernel command
    clSetKernelArg(kernel,0,sizeof(cl_mem),&input);
    clSetKernelArg(kernel,1,sizeof(cl_mem),&output);
    global=DATA_SIZE;

    //enqueue the kernel command for execution
    clEnqueueNDRangeKernel(command_queue,kernel,1,NULL ,&global,NULL,0,NULL,NULL);
    clFinish(command_queue);

    //copy the results from out of the buffer
    clEnqueueReadBuffer(command_queue,output,CL_TRUE,0 ,sizeof(float)*DATA_SIZE,results,0,
    NULL,NULL);

    //print the results
    printf("output:");
    for(int i=0;i<DATA_SIZE;i++)
    {
    printf("%f\n",results[i]);
    //printf("no. of times loop run %d\n",count);
    }

    //cleanup-release OpenCL resources

    clReleaseMemObject(input);
    clReleaseMemObject(output);
    clReleaseProgram(program);
    clReleaseKernel(kernel);
    clReleaseCommandQueue(command_queue);
    clReleaseContext(context);
    end_time=clock();
    printf("execution time is%f",end_time-start_time);
    _getch();
    return 0;

    }

  2. #2

    Re: different time of cpu and gpu

    When you run your code without debugging, it should execute faster since the debugger doesn't have to sit in the background waiting for exceptions or allowing you to pause the program. Your kernel is quite simple and won't stretch the processing power of your GPU, hence the CPU should be faster when not debugging.

    The reason for your kernel not being faster on the GPU is that for every two pieces of data you transfer, you do one maths operation. It is faster for the CPU to read those elements from RAM than it is to transfer them from RAM to GPU because the PCIe bus is slow. Ideally, you want to do a lot of operations on the GPU for each element of data that gets sent to the GPU.

    I also just noticed that you are timing the entire program rather than just the kernel and data transfers. This is unfair to OpenCL as you are also timing how long it takes to compile your kernel.

  3. #3

    Re: different time of cpu and gpu

    pls tell me i am new to opencl, because whenever increase the value of DATA_SIZE it shows that stack overflow .so i am not able to show that GPU is performing well for larger data than CPU, i am taking for whole time because we are reading from CPU both the result thats y i applied to timer there.....any valuable modification u suggest.....pls help



    Quote Originally Posted by chippies
    When you run your code without debugging, it should execute faster since the debugger doesn't have to sit in the background waiting for exceptions or allowing you to pause the program. Your kernel is quite simple and won't stretch the processing power of your GPU, hence the CPU should be faster when not debugging.

    The reason for your kernel not being faster on the GPU is that for every two pieces of data you transfer, you do one maths operation. It is faster for the CPU to read those elements from RAM than it is to transfer them from RAM to GPU because the PCIe bus is slow. Ideally, you want to do a lot of operations on the GPU for each element of data that gets sent to the GPU.

    I also just noticed that you are timing the entire program rather than just the kernel and data transfers. This is unfair to OpenCL as you are also timing how long it takes to compile your kernel.

  4. #4

    Re: different time of cpu and gpu

    If you continue using element-wise multiplication of two vectors to demonstrate the speed of a GPU vs. a CPU then the CPU will always do well. You should try to demonstrate a different problem.

    Taking from linear algebra, the popular operation is matrix multiplication. If you multiply two NxN matrices then the amount of data that gets sent to the GPU is 2N^2 but the number of floating point operations is at least N(N-1)N^2. Taking the ratio of operations to matrix elements gives N(N-1)N^2 / (2N^2) = N(N-1)/2. For a 1000x1000 matrix, that means that half a million floating point operations get performed for each element that got transfered to the GPU. In such a situation, the GPU excels because one sends a bit of data to the GPU and then sits back and waits for a very long calculation to finish.

    As for working around your stack overflow when increasing DATA_SIZE, the mistake that you have made is to declare inputData as an array of float inside the main function. Arrays declared there will use the stack, which is a small (comparatively) section of memory. You should rather work with a pointer to an array of float, i.e. float* inputData = new float[DATA_SIZE]; <<lines of code>> delete[] inputData; That way, the array gets allocated on a part of memory called the heap, which can be multiple gigabytes if you are using a 64-bit operating system. You would have to do the same thing for results, i.e. float* results ...

  5. #5

    Re: different time of cpu and gpu

    thanks for quick and valuable reply
    thanks

  6. #6

    Re: different time of cpu and gpu

    as u told previously it is unfair to calculate all the program so i used clgetprofileinfo for calculating kernel time for cpu and gpu but i am having the error if u can help.
    my program is ->
    // Hello.cpp : Defines the entry point for the console application.
    //

    //#include <stdafx.h>
    #include<iostream>
    #include<stdio.h>
    #include<stdlib.h>
    #include<conio.h>
    #include<time.h>
    #include "CL/cl.h"
    #define DATA_SIZE 100000
    using namespace std;
    const char *KernelSource =
    "kernel void hello(global float *input , global float *output)\n"\
    "{\n"\
    " size_t id =get_global_id(0);\n"\
    "output[id] =input[id]*input[id];\n"\
    "} "
    "\n"\
    "\n";
    //float start_time,end_time;

    int main(void)
    {
    cl_ulong start_time,end_time,elapsed_time;
    //start_time=clock();
    cl_context context;
    cl_context_properties properties[3];
    cl_kernel kernel;
    cl_command_queue command_queue;
    cl_program program;
    cl_int err;
    cl_uint num_of_platforms=0;
    cl_platform_id platform_id;
    cl_device_id device_id;
    cl_uint num_of_devices=0;
    cl_mem input,output;
    cl_event gpuExec;
    size_t global;
    float executionTimeInSeconds;


    float inputData[DATA_SIZE];
    for(int j=0;j<DATA_SIZE;j++)
    {
    inputData[j]=(float)j;
    }

    float results[DATA_SIZE];//={0};

    // int i;

    //retrieve a list of platform variable
    if(clGetPlatformIDs(1,&platform_id,&num_of_platfor ms)!=CL_SUCCESS)
    {
    printf("Unable to get platform_id\n");
    return 1;
    }

    //try to get supported GPU DEvice
    if(clGetDeviceIDs(platform_id,CL_DEVICE_TYPE_GPU,1 ,&device_id,
    &num_of_devices)!=CL_SUCCESS)
    {
    printf("unable to get device_id\n");
    return 1;
    }

    //context properties list -must be terminated with 0
    properties[0]=CL_CONTEXT_PLATFORM;
    properties[1]=(cl_context_properties) platform_id;
    properties[2]=0;

    //create a context with the GPU device
    context=clCreateContext(properties,1,&device_id,NU LL,NULL,&err);

    //create command queue using the context and device
    command_queue=clCreateCommandQueue(context,device_ id,0,&err);

    //create a program from the kernel source code
    program=clCreateProgramWithSource(context,1,(const char**)
    &KernelSource,NULL,&err);

    //compile the program
    err=clBuildProgram(program,0,NULL,NULL,NULL,NULL);
    if((err!=CL_SUCCESS))
    {
    printf("build error \n",err);
    size_t len;
    char buffer[4096];
    //get the build log
    clGetProgramBuildInfo(program,device_id,CL_PROGRAM _BUILD_LOG,sizeof(buffer),buffer,&len);
    printf("----build Log---\n%s\n",buffer);
    exit(1);

    // return 1;
    }

    //specify which kernel from the program to execute
    kernel=clCreateKernel(program,"hello",&err);

    //create buffers for the input and output
    input=clCreateBuffer(context,CL_MEM_READ_ONLY,size of(float)*DATA_SIZE,NULL,NULL);

    output=clCreateBuffer(context,CL_MEM_WRITE_ONLY,si zeof(float)*DATA_SIZE,NULL,NULL);

    //load data into the input buffer

    clEnqueueWriteBuffer(command_queue,input,CL_TRUE,0 ,
    sizeof(float)*DATA_SIZE,inputData,0,NULL,NULL);

    //set the argument list for the kernel command
    clSetKernelArg(kernel,0,sizeof(cl_mem),&input);
    clSetKernelArg(kernel,1,sizeof(cl_mem),&output);
    global=DATA_SIZE;

    //enqueue the kernel command for execution
    clEnqueueNDRangeKernel(command_queue,kernel,1,NULL ,&global,NULL,0,NULL,&gpuExec);
    clFinish(command_queue);

    //copy the results from out of the buffer
    clEnqueueReadBuffer(command_queue,output,CL_TRUE,0 ,sizeof(float)*DATA_SIZE,results,0,
    NULL,NULL);

    //print the results
    printf("output:");
    for(int i=0;i<DATA_SIZE;i++)
    {
    printf("%f\n",results[i]);
    //printf("no. of times loop run %d\n",count);
    }
    //Calculating the time.......

    clGetEventProfilingInfo(gpuExec, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start_time, NULL);
    clGetEventProfilingInfo(gpuExec, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end_time, NULL);

    /*calculate total elapsed time*/
    elapsed_time = end_time-start_time;
    executionTimeInSeconds = (float)(1.0e-9 *elapsed_time);
    //printf("%f",&executionTimeInSeconds);
    cout<<"execution time"<<executionTimeInSeconds;
    _getch();
    /*end_time=clock();
    printf("execution time is%f",end_time-start_time);
    _getch();*/
    //cleanup-release OpenCL resources

    clReleaseMemObject(input);
    clReleaseMemObject(output);
    clReleaseProgram(program);
    clReleaseKernel(kernel);
    clReleaseCommandQueue(command_queue);
    clReleaseContext(context);

    return 0;

    }
    -------------------------------------------------------------------------------------

    when i run this program on win 7 64 bit,vs 2010. i get this error
    Problem signature:
    Problem Event Name: APPCRASH
    Application Name: Hello.exe
    Application Version: 0.0.0.0
    Application Timestamp: 50a46359
    Fault Module Name: igdrcl64.dll
    Fault Module Version: 8.15.10.2712
    Fault Module Timestamp: 4f7119e9
    Exception Code: c0000005
    Exception Offset: 000000000001b5e9
    OS Version: 6.1.7601.2.1.0.768.2
    Locale ID: 16393
    Additional Information 1: a493
    Additional Information 2: a493a1183067b5107213879be06ab3eb
    Additional Information 3: c660
    Additional Information 4: c660a08c8ef8ab0df12390ba7124949d



    thanks in advance

  7. #7

    Re: different time of cpu and gpu

    To use the profiling commands, you first need to enable profiling for the specific command queue. The line

    Code :
    command_queue=clCreateCommandQueue(context,device_id,0,&err);

    Should be changed to

    Code :
    command_queue=clCreateCommandQueue(context,device_id,CL_QUEUE_PROFILING_ENABLE,&err);

    That enables profiling.

  8. #8

    Re: different time of cpu and gpu

    thanks a lot .....
    this forum rocks
    but one more question when use clgeteventprofileinfo it calculate which time
    means time of

    (data transfer from cpu to gpu + gpu processing time + transfer back the results from gpu to cpu)
    or only (gpu processing time)


    thanks

  9. #9
    Senior Member
    Join Date
    Oct 2012
    Posts
    166

    Re: different time of cpu and gpu

    CL_PROFILING_COMMAND_START cl_ulong
    A 64-bit value that describes the
    current device time counter in
    nanoseconds when the command
    identified by event starts execution on
    the device.
    CL_PROFILING_COMMAND_END cl_ulong
    A 64-bit value that describes the
    current device time counter in
    nanoseconds when the command
    identified by event has finished
    execution on the device.

    Its the beginning and end of your profiled Event. So if you profile the kernel execution, it will be the kernel time, is you profile the writeBuffer, it will be your cpu-gpu transport time

  10. #10

    Re: different time of cpu and gpu

    please reply forum....
    thanks

Page 1 of 2 12 LastLast

Similar Threads

  1. Replies: 2
    Last Post: 10-14-2012, 08:41 AM
  2. A Kernel that run in CPU and GPU same time
    By luizdrumond in forum OpenCL
    Replies: 2
    Last Post: 08-29-2012, 11:59 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •