PDA

View Full Version : problem with openCL/multiple kernels



pelliegia
10-13-2012, 04:52 AM
hello

i am new in openCL, i am trying to create a program - the knns algorithm, and i want to use 2 kernels which i want to be in depended . So i create a document knnsOpenCL.c and 2 kernel_createDist.cl and kernel_ParallelSorting.cl, the first one i already be executed, and now i have problem with the second!

in my main i create a device and a context, i also have 2 different build_program for each kernel

and the problem is tha i have the bug Couldn't create a kernel 2 ---> -46 ----> which is CL_INVALID_KERNEL_NAME

so i am wondering what is the exact problem! i don't know if i should create 2 devices?

i hope i find the answer here! any ideas?

chippies
10-14-2012, 08:48 AM
You don't have to create multiple devices. Your process sounds correct, this is an error in your .cl file. Can you please post the line that defines the kernel.

pelliegia
10-14-2012, 04:04 PM
thank you very much for your reply!
i solved this issue.. now i have another problem

i want to compare the cpu time and the opencl time in my code my code is the following..

//first i define the cl_event
cl_ulong start, end;
cl_event event;

.......(code)........

/* Create a command queue */
queue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE , &err);
if(err < 0) {
perror("Couldn't create a command queue 1");
printf("error %d \n ", err);
exit(1);
};
........(code)............

err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);
if(err < 0) {
perror("time error end 1 %d");
printf("error %d \n ", err);
exit(1);
}
/* Read the kernel's output DILADI OUSIASTIKA TO DIST */
err = clEnqueueReadBuffer(queue, d_result_Dist.elements, CL_TRUE, 0, sizeDist, Dist, 0, NULL, NULL);
if(err < 0) {
perror("Couldn't enqueue the kernel 1 %d");
printf("error %d \n ", err);
exit(1);
}

err=clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, NULL);

if(err < 0) {
perror("time error start 1");
printf("error %d \n ", err);
exit(1);
}

float executionTimeInMilliseconds1 = (end - start) * 1.0e-6f;
printf("[OPENCL] Time elapsed for GPU first kernel: %f s\n", executionTimeInMilliseconds1);

you can notice that i define the CL_QUEUE_PROFILING_ENABLE in my queue but i take the bug -7, that means #define CL_PROFILING_INFO_NOT_AVAILABLE -7

so i am wondering where is my fault...

look this link, i find it useful to solve my problem but i still have this bug

http://stackoverflow.com/questions/1015 ... easurement (http://stackoverflow.com/questions/10155579/c-vs-opencl-how-to-compare-results-of-time-measurement)

chippies
10-15-2012, 08:52 AM
The problem is most likely at the point where you query the end time of the kernel. If the kernel has finished running then this profiling information is available and you should not receive any errors. If the kernel has not finished yet then the profiling info is not available and you will get this error.

Remember that enqueuing a kernel does not block the current host thread. You have to first wait for the kernel to finish, either by repeatedly checking the status of the event object by using clGetEventInfo until the status changes to CL_COMPLETE, or by using clWaitForEvents to block the current host thread until the kernel finishes.

After that, you should be able to get the profiling information.

pelliegia
10-17-2012, 06:42 AM
thank you very much! i find out what my fault is! :)


http://parallelis.com/how-to-measure-op ... tion-time/ (http://parallelis.com/how-to-measure-opencl-kernel-execution-time/)

pelliegia
10-17-2012, 10:37 AM
i have also another question.. :(

I want to make sure if i understand the definition of size_t global_size[] and size_t local_size[] in 2 dimensions, for global and local work space!

I have to read from buffer (from my first kernel) an array of QxN size , and from my second kernel an array of Qxk size! so i suppose that i have to define global_size[] = {Q,N} and local_size[] = {16,16} i think that the local size is similar like the block size in cuda so i choose 16x16 to be more appropriate !
For my second kernel i define global_size[] = {Q,k} and local_size[] = {16,16}.

So, when i run my program i give 3 arguments, size N, size Q and size k, if i try to create an array of
{N=16, Q=16, k=16} or {N=64, Q=64, k=16} i don't have any problem, but if i try for {N=128, Q=128, k=16} or another combination i have the bug error -54 CL_INVALID_WORK_GROUP_SIZE , so i think that something i didn't understand so well, i would be grateful if someone help me to manage with that issue! I have read many blocks and sites but i post again to this page because i want an answer to my specific problem!

thank you anyway!

notzed
10-18-2012, 12:51 AM
You should probably just start a new topic.

Since what you describe should work, you should include code, which will prevent misunderstandings or forgetting to mention important details.