# Thread: can doing a loop calculation on a gpu be faster than a cpu ?

1. ## can doing a loop calculation on a gpu be faster than a cpu ?

Hi, I'm working on my research about opencl programming.
I'm still new to gpgpu and having a really hard time understanding how it works.
can anybody here could help with my questions?

1.Does having a lot of kernels slows down the execution's speed ?
2.Can doing a lot of loops calculation on a gpu be faster than on a cpu ?

let's say, I have this sample code :
Code :
```__kernel void calcu_h(__global float* sum_h, __global float* w_hi, __global int* unit_i, __global float* unit_h)
{

int i,h,p;

for(p=0; p<26;p++){
for(h=0;h<100;h++){
for(sum_h[(p*100)+h]=0.0,i=0;i<=100;i++)
sum_h[(p*100)+h]+=w_hi[(h*100)+i]* (float)unit_i[(p*100)+i];
unit_h[(p*100)+h] = 1.0/(1.0+(float)exp(-(sum_h[(p*100)+h])));
}

unit_h[(p*100)+h]=1.0;
}

}```
what's the best way to break these loops ?

2. ## Re: can doing a loop calculation on a gpu be faster than a c

1.Does having a lot of kernels slows down the execution's speed ?
Most of the times a single kernel call would be enough to perform a complete task in parallel, but it all depends on the type of your computation or what you are trying to do. It should be quite clear that to call a kernel, you need to set its arguments, enqueue it for execution and then read back the results from the device. I personally don't think having many kernel calls in your application would be efficient. Better to give your kernel some general data (for example pointer to a chunk of memory) and then perform the access calculations in your kernel with the help of functions like get_local_id()

2.Can doing a lot of loops calculation on a gpu be faster than on a cpu ?
I think having many nested loops in your kernel is not a good idea. You have to remember that your kernel will be executed for every instance of 'work item' in parallel. Having many nested loops will surely increases the overhead and slows down the parallel execution as a whole. Try to eliminate unnecessary loops for more efficiency. Write your algorithms in a smarter way and try loop-unrolling techniques.