Results 1 to 2 of 2

Thread: Faster on CPU than on the GPU

  1. #1
    Junior Member
    Join Date
    Aug 2010
    Posts
    3

    Faster on CPU than on the GPU

    All,

    I need to run a population based code that computes an algebraic expression on the GPU and return it back to the CPU. The host code seems to have been setup correctly but the CPU implementation is at least twice faster than the GPU. My global work size is 200 for the CL code below.

    Here are my questions:

    1) Are there any obvious flaws in the CL code?

    2) Is it possible that a dedicated video card is any faster than a comparable on-board chip? I tried working on two platforms: 1) iMac (3.06 GHz CPU with a GeForce 8800 GS) and 2) Macbook Pro (2.26 GHz with a GeForce 9400 M)

    3) This is a straight OpenCL implementation from Apple. Will using CL with Cuda architecture be any faster?

    Code :
    __kernel void clTestProblems(int funcindx, __global float *dv, int nvars, __global float *fitval)
    {
    	int gid = get_global_id(0);
     
    	if(funcindx == 1 || funcindx == 2 || funcindx == 3)
    	{
             ...
    	}
    	else if(funcindx == 4 || funcindx == 5 || funcindx == 6 || funcindx == 7)
    	{
    	float term1 = 0.0;
    	float term2 = 0.0;
    	float pi_2 = 6.2831854;	//2 x pi
    	float e_1 = exp(1.0);	//store exponential of 1.0
    	int indx = 0;
    	int offset = gid*nvars;
     
    	for(int i=0; i<nvars; i++)
    	{
    		indx = offset+i;
    		term1 += pown(dv[indx], 2.0);
    		term2 += cos(pi_2*dv[indx]);
    	}
    	term1 = term1/((float) nvars);
    	term1 = -0.2*sqrt(term1);
    	term1 = -20.0*exp(term1);
     
    	term2 = term2/((float) nvars);
    	term2 = exp(term2);
     
    	fitval[gid] = term1 - term2 + 20.0 + e_1;
     
    	}
    }


    Thanks,
    Vijay.

  2. #2

    Re: Faster on CPU than on the GPU

    Quote Originally Posted by vijaykiran
    My global work size is 200 for the CL code below.
    200 threads is way too few for a GPU. Scale the problem up and the GPU will probably catch up to and surpass the CPU. I haven't studied your particular program, but that's the way it usually goes.

Similar Threads

  1. CPU faster in vector addition than GPU
    By SabinManiac in forum OpenCL
    Replies: 5
    Last Post: 10-13-2011, 12:14 PM
  2. Replies: 2
    Last Post: 09-16-2010, 11:32 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •