Hello everyone.
I have a problem with kernel execution on large number of threads.
I start kernel like this:
Code :
int count = 200;
clEnqueueNDRangeKernel(cmd_queue, my_kernel, 1, NULL, &count, &count, NULL, NULL, NULL);
Everything is working fine with count=200, but when I change it to 300 or more I recieve a CL_OUT_OF_RESOURCES error immediately after invoking clEnqueueNDRangeKernel. You can see kernel code below, it simply computes Cholesky decomposition.
Code :
kernel void cholesky_decomposition(global float* A, int n)
{
	unsigned int p = get_local_size(0);
	unsigned int u = get_local_id(0);
	for (unsigned int k=0; k<n; k++) {
		float s = sqrt(A[k*n + k]);
		for (unsigned int i=k+u; i<n; i+=p) {
			A[i*n + k] /= s;
		}
		barrier(CLK_GLOBAL_MEM_FENCE);
		for (unsigned int j=k+1+u; j<n; j+=p) {
			for (unsigned int i=j; i<n; i++) {
				A[i*n + j] -= A[j*n+k]*A[i*n+k];
			}
		}
		barrier(CLK_GLOBAL_MEM_FENCE);
	}
}
I can't figure out what causing this error. Max work group size of my device (GeForce 8800GT) is 512, it supports OpenCL 1.0, I have the latest driver from NVIDIA site and OS WinXP SP3. The same problem occures with different kernels.
I appreciate any help on this problem, thanks in advance =)