Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: command queue goes dead after clEnqueueNDRangeKernel ?

  1. #1

    command queue goes dead after clEnqueueNDRangeKernel ?

    I've been trying to triangulate a really stubborn bug for the past couple days. I'm testing an algorithm to compute high dimensional integrals using a parallel "divide and conquer" approach. The outline is fairly simple: starting from two input arrays of all boundary data, each integration work item gets the boundary data for its subregion, computes integral and error estimates, and writes these values into respective output arrays.

    I'm building everything on a MacBook Air with an NVIDIA GeForce 320M using Apple's distribution of OpenCL 1.0.

    I want to emphasize that when I run the OpenCL code for CL_DEVICE_TYPE_CPU, the algorithm works perfectly. The computations concur with the mathematically exact values up to numerical tolerance.

    It is when I run the code for CL_DEVICE_TYPE_GPU that problems begin.

    Instead of posting all of my code, I'll try to highlight what I think is relevant. The prototype of my kernel looks like this:
    Code :
    __kernel void
    mgk13s(int dim, __global void* params, 
    	__global const float* a, __global const float* b,
    	__global float* result, __global float* rawerr, __global float* resabs)
    In my main routine, I'm denoting the number of work items by global_ws. The dimension of the domain (and hence boundary data) is denoted by dim. Allocating memory, buffers and setting up kernel arguments looks like this:
    Code :
    	size_t out_size = global_ws * sizeof(float);
    	size_t in_size =  dim * out_size;
    	float* a = (float*) malloc(in_size);
    	float* b = (float*) malloc(in_size);
    	float* result = (float*) malloc(out_size);
    	float* rawerr = (float*) malloc(out_size);
    	float* resabs = (float*) malloc(out_size);
     
    	cl_mem par_buf = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(params), NULL, NULL);
    	cl_mem a_buf = clCreateBuffer(context, CL_MEM_READ_ONLY, in_size, NULL, NULL);
    	cl_mem b_buf = clCreateBuffer(context, CL_MEM_READ_ONLY, in_size, NULL, NULL);
    	cl_mem res_buf = clCreateBuffer(context, CL_MEM_READ_WRITE, out_size, NULL, NULL);
    	cl_mem err_buf = clCreateBuffer(context, CL_MEM_READ_WRITE, out_size, NULL, NULL);
    	cl_mem abs_buf = clCreateBuffer(context, CL_MEM_READ_WRITE, out_size, NULL, NULL);
     
    	err  = clSetKernelArg(kernel, 0, sizeof(int), &dim);
    	err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &par_buf);
    	err |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &a_buf);
    	err |= clSetKernelArg(kernel, 3, sizeof(cl_mem), &b_buf);
    	err |= clSetKernelArg(kernel, 4, sizeof(cl_mem), &res_buf);
    	err |= clSetKernelArg(kernel, 5, sizeof(cl_mem), &err_buf);
    	err |= clSetKernelArg(kernel, 6, sizeof(cl_mem), &abs_buf);
    The code which executes the black box outlined above looks like this:
    Code :
    	err = clEnqueueWriteBuffer(queue, par_buf, CL_TRUE, 0, sizeof(params), params, 0, NULL, NULL);
    	err |= clEnqueueWriteBuffer(queue, a_buf, CL_TRUE, 0, in_size, a, 0, NULL, NULL);
    	err |= clEnqueueWriteBuffer(queue, b_buf, CL_TRUE, 0, in_size, b, 0, NULL, NULL);
    	if (err != CL_SUCCESS) ABORT(err);
     
    	err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_ws, &local_ws, 0, NULL, NULL);
    	if (err != CL_SUCCESS) ABORT(err);
    	err = clEnqueueBarrier(queue);
     
    	err = clEnqueueReadBuffer(queue, res_buf, CL_TRUE, 0, out_size, result, 0, NULL, NULL);
    	if (err != CL_SUCCESS) ABORT(err);
    	err |= clEnqueueReadBuffer(queue, err_buf, CL_TRUE, 0, out_size, rawerr, 0, NULL, NULL);
    	err |= clEnqueueReadBuffer(queue, abs_buf, CL_TRUE, 0, out_size, resabs, 0, NULL, NULL);
    The ABORT macro just displays the line number and CL_error information, which is how I'm isolating the problem.

    For certain parameter values, the first clEnqueueReadBuffer call returns an error of type CL_INVALID_COMMAND_QUEUE, although clEnqueueNDRangeKernel call returns CL_SUCCESS.

    Any ideas here?

    Incidentally, global and local work sizes for this example are, local_ws = 64, global_ws = 4096. These are set at the beginning, using an algorithm that distinguishes between the CPU and GPU:
    Code :
    	size_t local_ws = (DEVICE == CL_DEVICE_TYPE_GPU) ? 64 : 1;
    	size_t factor = N / local_ws;
    	size_t global_ws = (N % local_ws) ? ((factor + 1) * local_ws) : N;

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: command queue goes dead after clEnqueueNDRangeKernel ?

    For certain parameter values, the first clEnqueueReadBuffer call returns an error of type CL_INVALID_COMMAND_QUEUE, although clEnqueueNDRangeKernel call returns CL_SUCCESS.

    Any ideas here?
    That sounds like enqueuing the kernel works fine, while actually trying to execute it is failing for one reason or another. It could be causing an invalid memory access (page fault) or it could be something else.

    Have you passed a callback function "pfn_notify" to clCreateContext()? The callback typically provides more information than the error codes alone.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3

    Re: command queue goes dead after clEnqueueNDRangeKernel ?

    No, I haven't tried that yet. If you have some expertise in this area, what kind of information should I be probing using the call back function?

    This really is a frustrating bug. I've downloaded other source trees for parallel computations on a GPU that demand even more resources and follow the same "read in, compute, write back" paradigm, and these compile and run on my machine without issue.

  4. #4

    Re: command queue goes dead after clEnqueueNDRangeKernel ?

    If new information is helpful, I created the the context with the following simple call back function:
    Code :
    void pfn_notify(const char *errinfo, const void *private_info, size_t cb, void *user_data)
    {
    	fprintf(stderr, "%s\n", errinfo);
    }

    And I edited the enqueuing and read-back part of my code as follows:
    Code :
    	err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_ws, &local_ws, 0, NULL, NULL);
    	fprintf(stderr, "line %d: err %d\n", __LINE__, err);
    	err = clEnqueueBarrier(queue);
    	fprintf(stderr, "line %d: err %d\n", __LINE__, err);
     
    	err = clEnqueueReadBuffer(queue, res_buf, CL_TRUE, 0, out_size, result, 0, NULL, NULL); \
    	fprintf(stderr, "line %d:  err %d\n", __LINE__, err);
    	err |= clEnqueueReadBuffer(queue, err_buf, CL_TRUE, 0, out_size, rawerr, 0, NULL, NULL);
    ...

    Passing through the "fprint(stderr" traps, the output is:
    Code :
    line 171: err 0
    line 173: err 0
    [CL_INVALID_COMMAND_QUEUE] : OpenCL Fatal Error : Read caused an error that invalidated the queue (0x100403bd0). This may be  due to a resource allocation failure at execution time.
    [CL_INVALID_COMMAND_QUEUE] : OpenCL Error : clEnqueueReadBuffer failed: Invalid command queue
    line 177:  err -36

    The error value -36 is just CL_INVALID_COMMAND_QUEUE. This doesn't tell me much more of anything, but maybe someone else can parse this better than I.

    Here's where it gets really weird:
    • the identical code executes perfectly when the device CL_DEVICE_TYPE_CPU is specified instead of _GPU and[/*:m:2ycto2be]
    • when executing on the device CL_DEVICE_TYPE_GPU but for smaller parameter values, there is no CL_INVALID_COMMAND_QUEUE error, however values in result, rawerr and resabs returned from res_buf, err_buf and abs_buf respectively are just zeros.[/*:m:2ycto2be]


    Thanks to everyone who has taken the time to read this through and think about it for a bit!

  5. #5
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: command queue goes dead after clEnqueueNDRangeKernel ?

    OpenCL Fatal Error : Read caused an error that invalidated the queue
    In other words, the work-item is dereferencing a pointer and that's causing a page fault. Could it have anything to do with the different local size you've chosen for GPU and CPU?
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  6. #6

    Re: command queue goes dead after clEnqueueNDRangeKernel ?

    Hi David, you might be on to something. I really didn't know how to select the global and local work sizes, except that the spec requires the global to be evenly divisible by the local.

    http://www.khronos.org/registry/cl/sdk/ ... ernel.html

    For my machine,
    • CL_DEVICE_MAX_WORK_ITEM_SIZES = 512 for the GPU [/*:m:g23dcibi]
    • CL_DEVICE_MAX_WORK_ITEM_SIZES = 1 for the CPU (as expected). [/*:m:g23dcibi]

    I got the numbers local_ws = 64 and global_ws = n * 64 for the GPU from another project that compiles and runs on my machine.

    Any additional advice on this matter is most welcome. I'll play around with variations on these values when I get some time later this evening.

  7. #7
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: command queue goes dead after clEnqueueNDRangeKernel ?

    If you don't know which values to pass for the local size, I would simply pass NULL. That tells the OpenCL implementation to choose a suitable value. That way you don't need to worry about being evenly divisible.

    That said, it may still trigger the same bug you are seeing today. Why don't you try a local work size of 1, which is the same value you used for the CPU and that seems to work? Although it will run slowly it will give us some more info on what's going wrong.

    Finally, could you take your kernel code and try to simplify it as much as possible while maintaining the error you are seeing? After you've simplified it as much as you can please post the code here and we can give it a look.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  8. #8

    Re: command queue goes dead after clEnqueueNDRangeKernel ?

    Okay. I had a lot of concurrent bugs in my code. The only one that I still can't eradicate is CL_INVALID_COMMAND_QUEUE error.

    Here is the kernel that I'm using; the source file is test4.cl.
    Code :
    __kernel void
    test4(size_t dim, __global float* out)
    {
    	size_t offset = get_global_id(0);
     
    	// set M = n^dim
    	size_t M = 1, n = 10;
    	for (size_t i = 0; i < dim; ++i) 
    		M *= n;
     
    	// enumerate multi-indices k = (k_1,..., k_d), 0 ? k_i ? n-1,
    	// but don't do anything else!
    	for (size_t j = 0; j < M; ++j) {
    		size_t m = 1;
    		for (size_t i = 0; i < dim; ++i) {
    			size_t ki = (j / m) % n;
    			m *= n; 
    		}
    	}
    }
    If you recognize that this kernel doesn't do anything except count to 10^dim and so some trivial arithmetic, you're right!

    Here is the host code:
    Code :
    /* test4.c */
     
    #include <stdio.h>
    #include <stdlib.h>
    #include <assert.h>
    #include <sys/stat.h>
    #include <OpenCL/OpenCL.h>
     
    #define CL_CPPFLAGS ""
     
    #ifdef GPU
    #define DEVICE CL_DEVICE_TYPE_GPU
    #define DEVICE_STR "test4.cl compiled on GPU"
    #else
    #define DEVICE CL_DEVICE_TYPE_CPU
    #define DEVICE_STR "test4.cl compiled on CPU"
    #endif
     
    char* load_program_source(const char *filename);
     
    int main(int argc, char** argv)
    {
    	size_t dim = (argc > 1) ? atoi(argv[1]) : 8; // dimension copy array
    	size_t global_ws = (argc > 2) ? atoi(argv[2]) : 1; // number of work items
     
    	printf("%s\n", DEVICE_STR);
     
    	const char* quad_kernel = "test4.cl";
    	const char* kernel_name = "test4";
     
    	int err;
    	cl_context context;	
    	cl_device_id device;
    	cl_command_queue queue;
    	cl_program program;
    	cl_kernel kernel;
     
    	err = clGetDeviceIDs(NULL, DEVICE, 1, &device, NULL);
     
    #ifdef __CL_EXT_H  // Apple extension for error logging.
    	context = clCreateContext(0, 1, &device, &clLogMessagesToStderrAPPLE, NULL, &err);
    #else
    	context = clCreateContext(0, 1, &device, NULL, NULL, &err);
    #endif
    	queue = clCreateCommandQueue(context, device, 0, NULL);
     
    	char scratch[2048];
    	char* source = load_program_source(quad_kernel);
     
    	program = clCreateProgramWithSource(context, 1, (const char**)&source, NULL, &err);
    	err = clBuildProgram(program, 1, &device, CL_CPPFLAGS, NULL, NULL);
    	err |= clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG, 2048, scratch, NULL);
    	if (err != CL_SUCCESS) {
    		fprintf(stderr, "[BUILD LOG]\n%s\n", scratch);
    		return err; 
    	}
    	kernel = clCreateKernel(program, kernel_name, &err);
     
    	size_t out_size = global_ws * sizeof(float);
     
    	float* out_loc = (float*) malloc(out_size);
    	cl_mem out_buf = clCreateBuffer(context, CL_MEM_WRITE_ONLY, out_size, NULL, NULL);
     
    	err = clSetKernelArg(kernel, 0, sizeof(size_t), &dim);
    	err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &out_buf);
    	assert(err == CL_SUCCESS);
     
    	err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_ws, NULL, 0, NULL, NULL);
    	err = clEnqueueBarrier(queue);
    	assert(err == CL_SUCCESS);
     
    	err = clEnqueueReadBuffer(queue, out_buf, CL_TRUE, 0, out_size, out_loc, 0, NULL, NULL);
    	assert(err == CL_SUCCESS);
     
    	free(out_loc);
    	clReleaseMemObject(out_buf);
     
    	clReleaseKernel(kernel);
    	clReleaseProgram(program);
    	clReleaseCommandQueue(queue);
    	clReleaseContext(context);
     
    	return 0;
    }
     
    char* load_program_source(const char *filename)
    { 
    	struct stat statbuf;
    	FILE *fh; 
    	char *source; 
     
    	fh = fopen(filename, "r");
    	if (fh == 0)
    		return 0; 
     
    	stat(filename, &statbuf);
    	source = (char *) malloc(statbuf.st_size + 1);
    	fread(source, statbuf.st_size, 1, fh);
    	source[statbuf.st_size] = '\0'; 
     
    	return source; 
    }
    One should compile two executables, one for the GPU using the flag -DGPU, and the other for the CPU without any flags. Here are the detailed results on my machine, a MacBook Air with NVIDIA GeForce 320M...
    Code :
    $ clang -framework OpenCL test4.c -DGPU -o gpu
    $ clang -framework OpenCL test4.c -o cpu
    $ ./cpu && ./gpu 
    test4.cl compiled on CPU
    test4.cl compiled on GPU
    [CL_INVALID_COMMAND_QUEUE] : OpenCL Fatal Error : Read caused an error that invalidated the queue (0x100107010). This may be  due to a resource allocation failure at execution time.
    Break on OpenCLFatalBreak to debug.
    [CL_INVALID_COMMAND_QUEUE] : OpenCL Error : clEnqueueReadBuffer failed: Invalid command queue
    Break on OpenCLErrorBreak to debug.
    Assertion failed: (err == CL_SUCCESS), function main, file test4.c, line 103.
    ...
    Any ideas?

  9. #9

    Re: command queue goes dead after clEnqueueNDRangeKernel ?

    I neglected to mention: it's not some simple arithmetic overflow error in the kernel. Using the parameters set in the beginning of the host file, i.e., dim = 8, the double loop in the kernel counts to 10^8 = 100 million. The size_t type is at least a 32-bit unsigned integer, according to the OpenCL spec, which overflows at about 4.2 billion.

  10. #10
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: command queue goes dead after clEnqueueNDRangeKernel ?

    What happens if you replace the kernel with an empty kernel that takes the same parameters and does nothing?
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Page 1 of 2 12 LastLast

Similar Threads

  1. Command Queue
    By chanakya.sun in forum OpenCL
    Replies: 3
    Last Post: 11-09-2011, 05:39 PM
  2. When is a command queue actually executed?
    By sanderbeckers in forum OpenCL
    Replies: 6
    Last Post: 03-03-2011, 06:28 PM
  3. Problem with Command queue
    By Gregorien in forum OpenCL
    Replies: 0
    Last Post: 03-25-2010, 02:55 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •