Multiple kernel invocations within kernel or via host
I want to chain together multiple kernels, but is it better to call the kernel functions from within a kernel or via the host.
Pseudo code below:
Kernel calling kernel
__kernel void vsubtract( __global float * a, __global float * b, __global float * c, const unsigned int count, unsigned int red)
int i = get_global_id(0);
if(i < count)
a[i] = b[i] - a[i];
c[i] = a[i];
a[i] = a[i] * a[i];
//call reduction kernel
reduction(a, count, red);
or host calling kernels
vsubtract(cl::EnqueueArgs(queue, cl::NDRange(count), cl::NDRange(local)), d_a, d_b, d_c, count, red);
queue.enqueueReadBuffer(d_a, CL_TRUE, 0, sizeof(float) * LENGTH, &vector_a);
reduction(cl::EnqueueArgs(queue, cl::NDRange(count), cl::NDRange(local)), d_a, count, red)
I would assume it would be faster to have the kernel calling the other kernels to avoid the additional data transfer with the host and the device.
Is there any issues that I need to be aware of if I have kernels calling kernels?